CN112947484A

CN112947484A - Visual navigation method and device for mobile robot in intensive pedestrian environment

Info

Publication number: CN112947484A
Application number: CN202110347180.0A
Authority: CN
Inventors: 刘奇; 李衍杰; 庞玺政; 陈美玲; 吕少华; 陈时雨
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-06-11

Abstract

The invention discloses a navigation method of a mobile robot in a dense pedestrian environment based on reinforcement learning and traditional path planning, which respectively carries out global path planning and local path planning by adopting a mode of combining the traditional path planning and the reinforcement learning. The method comprises the steps of learning the complex motion of pedestrians in the environment by using a reinforcement learning method, so that the autonomous obstacle avoidance of the mobile robot is realized, and the navigation of the mobile robot in a dynamic environment is further realized. The invention can quickly make obstacle avoidance decisions in the environment of dense pedestrians, and expands the application scene of the mobile robot.

Description

Visual navigation method and device for mobile robot in intensive pedestrian environment

Technical Field

The invention belongs to the field of mobile robot navigation, relates to a visual navigation method and a visual navigation device of a mobile robot in a dense pedestrian environment, and particularly relates to a visual navigation method and a visual navigation device of a mobile robot in a dense pedestrian environment based on reinforcement learning and traditional path planning.

Background

When the robot uses the traditional navigation method to navigate, the method mainly comprises two steps: path planning and trajectory tracking. The first step is that according to the map of the current static environment, an optimal path is planned for the mobile robot under the premise that the current position and the target position are not collided; the best evaluation indexes are various, such as shortest path, lowest energy consumption and the like. And the second step is to plan the motion according to the track generated in the first step on the premise of satisfying the kinematic constraint of the robot. In a first step, global path planning based on static maps may ensure a basic optimality of the path. In the second step, although the conventional obstacle avoidance method can achieve collision avoidance with surrounding obstacles based on local information, global optimization cannot be guaranteed generally. However, the flexibility of local planning is emphasized too much in the process of trajectory tracking, and the navigation performance is not good due to the fact that the local optimal solution is easily trapped. Specific disadvantages of these methods are as follows: (1) algorithms contain a large number of parameters that need to be adjusted manually, which makes such algorithms very sensitive to scene changes and unable to adapt automatically to different scenes. (2) Acceptable action decision parameters for even a single scene adjustment require extensive experience and complex experimentation.

To address the above problems, more researchers have chosen to use learning-based methods to deal with manually adjusting difficult navigation environments. Many scholars use both mock learning and reinforcement learning in an attempt to solve decision-making problems in complex environments. Supervised learning-based mock learning uses an artificial neural network to fit state-to-motion mappings from a large number of expert samples, thereby enabling the robot to gain the ability to make acceptably reasonable motions in complex environments.

It should be noted that the assumption of supervised learning is that the samples satisfy independent and identically distributed features. However, the temporal correlation between the state pairs in the decision is so strong that the samples cannot effectively satisfy this premise assumption. Multi-modal problems in decision making also limit the generalization and applicability of mock learning.

Disclosure of Invention

The invention provides a visual navigation method of a mobile robot in a dense pedestrian environment, aiming at solving the problems in the prior art. The method for enhancing the learning is used for learning the complex motion of the pedestrian in the environment, so that the autonomous obstacle avoidance of the mobile robot is realized, the navigation of the mobile robot in the dynamic environment is further realized, and the application scene of the mobile robot is expanded.

In order to achieve the above object, an embodiment of the present invention provides a visual navigation method for a mobile robot in a dense pedestrian environment, including the following steps:

s101, acquiring a static environment map where the robot is located and the starting point position and the target point position of the robot;

s102, planning a global path for the starting-target point pair by using a Dijkstra algorithm;

s103, generating a plurality of global path points on the planned global path according to a fixed distance for subsequent local path planning;

and S104, carrying out local path planning by using a PPO algorithm to follow the global path.

Further, the Dijkstra algorithm inputs parameters including a static map, a current moment position of the robot and a target point position, and outputs parameters as global path points;

the PPO algorithm has the input parameters of a 2D RGB image, the current time position of the robot and the position of a global path point nearest to the robot, and the output parameters of the speed and the direction of the mobile robot.

Further, the global path planning only considers a static map where the mobile robot is located, and generates a global path to be realized by using a traditional path planning method Dijkstra algorithm.

Furthermore, the local path planning is used for completing a local navigation task between every two global path points, the motion condition of pedestrians near the mobile robot is judged according to 2D RGB image data returned by the RGB camera, and a PPO algorithm is input by combining the current position of the robot and the position of the nearest global path point, so that the reinforcement learning decision network can flexibly avoid surrounding static obstacles and pedestrians while following the planned global path.

Further, the 2D RGB image data input to the PPO algorithm is subjected to an attention mechanism in advance to extract visual features.

The embodiment of the invention also provides a visual navigation device of the mobile robot in the dense pedestrian environment, which comprises the following modules:

the acquisition module is used for acquiring a static environment map where the robot is located and the starting point position and the target point position of the robot;

a global path planning module for planning a global path for the start-target point pair using Dijkstra algorithm;

a global path point generating module, configured to generate a plurality of global path points on the planned global path according to a fixed distance, so as to be used for subsequent local path planning;

and the local path planning module is used for carrying out local path planning by adopting a PPO algorithm to follow the global path.

Compared with the prior art, the invention has the main advantages that: the invention combines reinforcement learning and the traditional path planning method, thereby not only ensuring the global optimality of the navigation path, but also enabling the mobile robot to flexibly avoid dynamic obstacles such as pedestrians, and the like, simultaneously having better generalization and being capable of adapting to the working environment which can change.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a visual navigation method of a mobile robot in a dense pedestrian environment according to the present invention;

FIG. 2 is a diagram of a simulation environment system architecture upon which the present invention is based;

fig. 3 is a functional block diagram of a visual navigation device of a mobile robot in a dense pedestrian environment according to the present invention.

Detailed Description

To facilitate understanding and implementing the present invention for those skilled in the art, the following technical solutions of the present invention are described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The navigation method of the reinforcement learning mobile robot in the intensive pedestrian environment of the invention has a flow chart as shown in fig. 1, and comprises the following steps:

Dijkstra's algorithm was proposed in 1959 by the netherlands computer scientist dikstra, and is therefore also called the dikstra algorithm. The method is a shortest path algorithm from one vertex to the rest of the vertices, and solves the shortest path problem in the weighted graph. The dijkstra algorithm is mainly characterized in that a greedy algorithm strategy is adopted from a starting point, and adjacent nodes of vertexes which are nearest to the starting point and have not been visited are traversed each time until the nodes are expanded to a terminal point. The Dijkstra algorithm in step S102 inputs parameters including a static map, a current time position of the robot, and a target point position, and outputs parameters as global path points.

The PPO (Rapid Policy optimization) algorithm is proposed by OpenAI, is an on-Policy depth Reinforcement Learning algorithm based on Policy gradient optimization and oriented to a continuous or discrete action space, belongs to a DRL (Deep Reinforcement Learning) algorithm based on a random Policy, has not only good performance (especially for a continuous control problem), but also is easier to implement compared with the prior TRPO method. The PPO algorithm in step S104 has input parameters of the 2D RGB image, the current time position of the robot, and the position of the global path point nearest to the robot, and has output parameters of the speed and direction of the mobile robot.

The simulation environment system shown in FIG. 2 is adopted for simulation, and the invention combines reinforcement learning and the traditional path planning method, thereby ensuring that the navigation path has global optimality, enabling the mobile robot to flexibly avoid dynamic obstacles such as pedestrians and the like, having better generalization and being capable of adapting to the working environment which can change.

Example two

As shown in fig. 3, the visual navigation device of a mobile robot in a dense pedestrian environment of the present invention includes the following modules:

the acquisition module 301 is configured to acquire a static environment map where the robot is located and the starting and target point positions of the robot;

a global path planning module 302 for planning a global path for the start-target point pair using Dijkstra's algorithm;

a global path point generating module 303, configured to generate a plurality of global path points on the planned global path according to a fixed distance, so as to be used for subsequent local path planning;

and a local path planning module 304, configured to perform local path planning by using a PPO algorithm to follow the global path.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, modules and units may refer to the corresponding processes of the foregoing method embodiments, and are not described herein again.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart and block diagrams may represent a module, segment, or portion of code, which comprises one or more computer-executable instructions for implementing the logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. It will also be noted that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention, and is provided by way of illustration only and not limitation. It will be apparent to those skilled in the art from this disclosure that various other changes and modifications can be made without departing from the spirit and scope of the invention.

Claims

1. A visual navigation method of a mobile robot in a dense pedestrian environment is characterized in that the method adopts a mode of combining traditional path planning and reinforcement learning to respectively carry out global path planning and local path planning, and the method comprises the following steps:

2. The method of claim 1, wherein: the Dijkstra algorithm input parameters comprise a static map, the current moment position and the target point position of the robot, and the output parameters are global path points; the PPO algorithm comprises input parameters of a 2D RGB image, the current time position of the robot and the position of a global path point nearest to the robot, and output parameters of the PPO algorithm are the speed and the direction of the mobile robot.

3. The method of claim 2, wherein: the global path planning only considers a static map where the mobile robot is located, and generates a global path of an initial-target point pair to be realized by using a Dijkstra algorithm in a traditional path planning method.

4. The method of claim 2, wherein: the local path planning is used for completing a local navigation task between every two global path points, the motion condition of pedestrians nearby the mobile robot is judged according to 2D RGB image data returned by the RGB camera, and a PPO algorithm is input by combining the current position of the robot and the position of the nearest global path point, so that the reinforcement learning decision network can flexibly avoid surrounding static obstacles and pedestrians while following the planned global path.

5. The method according to any one of claims 1-4, wherein: the 2D RGB image data input into the PPO algorithm is subjected to attention mechanism extraction visual features in advance.

6. A visual navigation device for a mobile robot in a dense pedestrian environment, the device comprising:

7. The apparatus of claim 6, wherein: the Dijkstra algorithm input parameters comprise a static map, the current moment position and the target point position of the robot, and the output parameters are global path points; the PPO algorithm comprises input parameters of a 2D RGB image, the current time position of the robot and the position of a global path point nearest to the robot, and output parameters of the PPO algorithm are the speed and the direction of the mobile robot.

8. The apparatus of claim 7, wherein: the global path planning only considers a static map where the mobile robot is located, and generates a global path of an initial-target point pair to be realized by using a Dijkstra algorithm in a traditional path planning method.

9. The apparatus of claim 7, wherein: the local path planning is used for completing a local navigation task between every two global path points, the motion condition of pedestrians nearby the mobile robot is judged according to 2D RGB image data returned by the RGB camera, and a PPO algorithm is input by combining the current position of the robot and the position of the nearest global path point, so that the reinforcement learning decision network can flexibly avoid surrounding static obstacles and pedestrians while following the planned global path.

10. The apparatus according to any one of claims 6-9, wherein: the 2D RGB image data input into the PPO algorithm is subjected to attention mechanism extraction visual features in advance.