Collaborative Q(lambda) Reinforcement Learning Algorithm

I would like to describe the algorithm more scientifically; define it mathematically much better than described in the paper.

I'm asking for guidance of how to prove an algorithm, for example in the form of convergence or superiority. How can I demonstrate advantages or disadvantages of an algorithm mathematically? How can I prove convergence or divergence? How can I show if it is better or worse than other algorithms?

I've already tested the algorithm on a mobile robot for navigation, please see:

I intend applying it for the task of finding optimal grasping, lifting and shaking policies of suspicious bags (contain anthrax, Ebola microbes or SARS), please see an initial experiment:

Thanks a lot!

Uri Kartoun