Question 1

How is reinforcement learning different from supervised learning?

Accepted Answer

Supervised learning copies labeled examples: it is shown the right answer and learns to reproduce it. Reinforcement learning has no labeled answers. Instead, the agent acts, the environment returns a score, and the agent nudges its future behavior toward higher scores. That makes RL a fit for problems where there is no single correct answer to copy, such as a good move in a game or a good motion for a robot, only outcomes that turn out better or worse.

Question 2

What does RLHF have to do with reinforcement learning?

Accepted Answer

RLHF, or reinforcement learning from human feedback, is the step that tunes a model like ChatGPT or Claude toward replies people prefer. It treats human preference as the reward signal and applies reinforcement learning to adjust the model. So RL is not confined to games and robots; it sits directly inside the quality-tuning stage of modern language models.

Question 3

Why is the reward function so important?

Accepted Answer

The reward function defines what counts as a good outcome, and the agent optimizes exactly what you reward, not what you meant. Define it poorly and the model finds shortcuts you never wanted; define it well and it can discover behavior even people had not noticed. Because reward design is hard and training can be unstable, teams often weigh RL carefully against simpler options like supervised learning or simulation before adopting it.

Reinforcement Learning

In plain language

An everyday picture

Where it shows up

A small example

Common misunderstanding

One line to take with you

Frequently asked