Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II

Kavli Affiliate: Ran Wang

| First 5 Authors: Rixin Wu, Ran Wang, Jie Hao, Qiang Wu, Ping Wang

| Summary:

This paper proposes a weight-aware deep reinforcement learning (WADRL)
approach designed to address the multiobjective vehicle routing problem with
time windows (MOVRPTW), aiming to use a single deep reinforcement learning
(DRL) model to solve the entire multiobjective optimization problem. The
Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to
optimize the outcomes produced by the WADRL, thereby mitigating the limitations
of both approaches. Firstly, we design an MOVRPTW model to balance the
minimization of travel cost and the maximization of customer satisfaction.
Subsequently, we present a novel DRL framework that incorporates a
transformer-based policy network. This network is composed of an encoder
module, a weight embedding module where the weights of the objective functions
are incorporated, and a decoder module. NSGA-II is then utilized to optimize
the solutions generated by WADRL. Finally, extensive experimental results
demonstrate that our method outperforms the existing and traditional methods.
Due to the numerous constraints in VRPTW, generating initial solutions of the
NSGA-II algorithm can be time-consuming. However, using solutions generated by
the WADRL as initial solutions for NSGA-II significantly reduces the time
required for generating initial solutions. Meanwhile, the NSGA-II algorithm can
enhance the quality of solutions generated by WADRL, resulting in solutions
with better scalability. Notably, the weight-aware strategy significantly
reduces the training time of DRL while achieving better results, enabling a
single DRL model to solve the entire multiobjective optimization problem.

| Search Query: ArXiv Query: search_query=au:”Ran Wang”&id_list=&start=0&max_results=3