Conference Dates

June 6-11, 2010


The production of vaccines is a complex biological process, with long cycle times and a high level of variation in raw materials, biological growth rates, and test methods. While long-term shifts or cycles in yield are not unusual, it is important to build understanding of the causes of shifts and cycles, for greater control and predictability. Hundreds of variables are monitored for every batch of vaccine produced; however, the relationships between product quality and the many process variables are difficult to quantify. In this article, we describe how mining historical process data using random forests and partial least squares (PLS) techniques enabled us to identify the drivers of yield for a bulk vaccine. Random forests and PLS converged on two key process parameters accounting for the largest shifts in yield. Taken together, the data mining analyses allowed us to understand the sources of variability in bulk yield, and to implement new controls which significantly reduced the variation.