In MANETs, congestion typically occurs on the interconnecting nodes between two or more groups of nodes. Routing to avoid the congested nodes via alternate, perhaps longer paths, allows more throughput, e.g., 50% more in the canonical 9-node 2-ring scenario. OLSR-Q is based on the routing protocol OLSR and a reinforcement learning (RL) agent to learn the most appropriate link states or "Directional Air Time"metric to avoid the congested nodes. The challenges for the RL agent are (1) to avoid congestion before packets are dropped and (2) to minimize the number of real valued or discrete observations or states. In this paper, three simplified OLSRd2-Qx versions are presented and compared to OLSRd2 and a centralized ODRb, Omniscient Dijkstra Routing-balanced, algorithm. The proposed OLSRd2-Qload algorithm provides the expected 50% increase in throughput on the 9-node 2-ring scenario with a specific test traffic scenario. On the NATO IST-124 Anglova scenario, and using an acknowledged message application, the Q-learning agents remain to be improved. The superior results of the centralized load balancing approach taken in ODRb will be investigated to train multi-agents systems including OLSR-Q.