COMP767 Assignments Planning and Learning: DynaQ Function Approximation: LSTD(lambda) Policy Gradient: Policy Gradient Methods