Georgi Tancev, PhD

Model-Free Reinforcement Learning

Optimizing particle properties in cooling crystallization.

Summary

In this project, I explored whether a complex, time-dependent chemical process—the unseeded batch cooling crystallization—can be optimized using model-free deep reinforcement learning. The process was formulated as a Markov Decision Process (Fig. 1), and a Proximal Policy Optimization (PPO) agent was trained to learn its control policy directly from simulated process data.

Fig. 1: Batch cooling crystallization diagram of paracetamol.

The objective was to deliberately shape the particle size distribution and produce larger, more stable crystals. The RL agent autonomously developed a control strategy that clearly outperformed classical benchmark profiles:

  • larger mean particle size compared with human-designed and naïve cooling strategies
  • comparable yield, despite stronger optimization toward particle size
  • robust and consistent control actions across the entire process trajectory

This work demonstrates how modern reinforcement-learning methods can discover optimal process pathways in complex, nonlinear systems when a suitable simulation environment is available—opening new opportunities for data-driven process development and digital chemical engineering.