Model-Free Reinforcement Learning

Optimizing particle properties in cooling crystallization.

Summary

Chemical process design is the search for an optimal manufacturing protocol to perform chemical operations. For transient processes such as crystallization, the optimal conditions can change over time, requiring a dynamic strategy. Model-free deep reinforcement learning is an approach that can be used to identify the best sequence of states with respect to a predefined reward function.

Crystallization is the primary separation and purification process for pharmaceuticals, and particle size is one of the most important properties in drug formulation. For example, electrostatic, flow, and adhesion problems can occur with smaller sizes. Currently, it is not yet possible to computationally design such processes as desired, but there is an interest in trajectories that result in certain particle properties.

In this work, proximal policy optimization is applied in a simulated environment to identify operational strategies that are optimal with respect to the desired particle properties in unseeded batch cooling crystallization processes of paracetamol in ethanol.

**Fig. 1:** Batch cooling crystallization diagram of paracetamol.

The proposed design method based on proximal policy optimization offers a way to obtain such trajectories, capable of generating particles that are up to 60% larger compared to human-based designs. In the long run, this may make drugs more affordable by facilitating drug development.