Topic 4: LLM-Based Decision Making & Reasoning

Topic 4 integrates Large Language Models (LLMs) into your autonomy stack to translate natural-language instructions into structured task graphs, support closed-loop autonomy, and optimize skills via reinforcement-style feedback.

4.1 Natural Language → Task Graph Translation (Module A)

From Commands to Structured Actions

Given a user instruction such as:

“Go to the kitchen, find the red mug, and bring it to me in the lab.”

The LLM’s job is to:

Parse the instruction.
Identify subtasks and their order.
Produce a structured representation (e.g., a task graph or behavior tree skeleton).

Example breakdown:

navigate(kitchen)
locate_object(mug, color=red)
pick(mug)
navigate(lab)
deliver(mug, recipient=human)

You will conceptually design:

Prompts that:
- Provide the robot’s capabilities (skill library).
- Ask the LLM to produce valid sequences using those skills.
A ROS 2 node that:
- Receives user commands.
- Calls the LLM (local or remote).
- Converts the output into a task graph or behavior tree configuration.

Prompt Engineering for Robotics Context

Key ideas:

Constrain outputs to known skills and actions.
Provide examples of valid task graphs.
Include contextual information:
- Known rooms and objects.
- Current map or world state summary.

The goal is to make LLM outputs predictable and easy to validate, not free-form.

4.2 Closed-Loop Autonomy (Module B)

Perception in the Loop

LLM-driven plans must stay grounded in reality:

Perception and mapping (Chapter 4) provide:
- Detected objects and humans.
- Current robot pose and map.
Execution reports back:
- Success/failure of skills.
- Unexpected observations (e.g., object not found).

Closed-loop behavior:

If the object is not detected:
- Re-scan or search a different area.
- Ask the user for clarification if needed.
If navigation is blocked:
- Re-plan via alternative routes.
- Adjust the task graph accordingly.

The LLM (or a higher-level policy) can:

Suggest alternative strategies.
Decide when to seek human input.
Update the task graph in response to new information.

4.3 Reinforcement-Based Task Optimization (Module C)

Reward Signals

Over time, you may want the robot to:

Execute tasks more efficiently.
Reduce failures and retries.
Improve smoothness and safety.

You can define reward signals such as:

Task success vs failure.
Time to completion.
Number of collisions or near-collisions (should be minimized).
Smoothness of motion (e.g., penalize jerky movements).

Skill Fine-Tuning

Reinforcement-style updates can:

Adjust low-level controller parameters.
Refine skill implementations (e.g., better grasp strategies).
Influence high-level choices (e.g., preferred routes).

Data logging:

Store:
- Task graphs used.
- Execution traces (sensor data, commands).
- Outcomes and rewards.
Use logs for:
- Offline RL experiments.
- Post-hoc analysis and manual tuning.

By the end of Topic 4, you should understand:

How LLMs can generate and refine task structures.
How closed-loop autonomy requires continuous perception–planning–execution integration.
How reinforcement-style feedback can gradually improve your agent’s behavior.

Topic 4: LLM-Based Decision Making & Reasoning

4.1 Natural Language → Task Graph Translation (Module A)

From Commands to Structured Actions

Prompt Engineering for Robotics Context

4.2 Closed-Loop Autonomy (Module B)

Perception in the Loop

4.3 Reinforcement-Based Task Optimization (Module C)

Reward Signals

Skill Fine-Tuning

AI Assistant

AI Assistant

Start a Conversation

4.1 Natural Language → Task Graph Translation (Module A)​

From Commands to Structured Actions​

Prompt Engineering for Robotics Context​

4.2 Closed-Loop Autonomy (Module B)​

Perception in the Loop​

4.3 Reinforcement-Based Task Optimization (Module C)​

Reward Signals​

Skill Fine-Tuning​

AI Assistant

AI Assistant

Start a Conversation

4.1 Natural Language → Task Graph Translation (Module A)

From Commands to Structured Actions

Prompt Engineering for Robotics Context

4.2 Closed-Loop Autonomy (Module B)

Perception in the Loop

4.3 Reinforcement-Based Task Optimization (Module C)

Reward Signals

Skill Fine-Tuning