Advanced Neural Computing Software Systems: Data Mining in Processing and Formulation

Published on: 
Pharmaceutical Technology, Pharmaceutical Technology-11-01-2006, Volume 2006 Supplement, Issue 7

Large amounts of data are currently being generated in an attempt to understand and improve formulation, process, and manufacturing efficiency. This task requires novel data mining software systems tailored specifically for the formulator and process engineer.

The pharmaceutical industry is undergoing a radical change in its ways of working. Directives from the US Food and Drug Administration have raised questions about understanding the relationships between formulation and manufacturing that result in controlled product performance. These relationships, however, can rarely be precisely quantified, and the formulation and manufacturing must be carried out in a design space that is both multidimensional in nature and difficult to conceptualize. Attempts to investigate these problems through the use of experimental design have generated large amounts of data, but processing these data remains a challenge. This article introduces two new implementations of data-mining software packages (INForm and FormRules, Intelligensys, Stokesley, UK) that are specifically tailored for the pharmaceutical formulator or process engineer to generate understandable rules and to model and optimize the process and formulation.

Advanced technologies

Both INForm and FormRules rely on advanced computing techniques such as neural networks, fuzzy logic, and genetic algorithms. Neural networks are mathematical constructs that are capable of learning, for themselves, the relationships within data. No assumptions need be made about the functional form of these relationships because the neural network simply tries out a range of models to determine one that best fits the known data. In recent years, artificial neural networks (ANNs) have increasingly and successfully been used to model complex behavior in problems such as those found in pharmaceutical formulation and processing (1, 2).

Fuzzy logic can be implemented to allow a formulator's objectives to be described in a linguistically intuitive way. Traditional "crisp" logic means that values must be either "true" (1) or "false" (0). Fuzzy logic, based on the theory of fuzzy sets, allows the membership in each set to take a value between 0 and 1. For example, if a tablet disintegration time of <300 s is desired, then a value >300 s will have a desirability of <100%, with the desirability decreasing as the disintegration time increases. This gives a formulator considerable control over an optimization process.

As the name implies, genetic algorithms use an evolutionary approach to finding the best solutions. To do this, a measure of fitness is set up, using the desired values for each property together with its importance relative to other properties. The optimization starts with a random trial population, and the fitness of each member in the population is assessed. New solutions are generated from the fittest members, using mathematical operations that are analogous to reproduction and mutation, and their fitness is assessed. In this way, the population evolves so that ultimately the fittest solution is the one that best meets a formulator's specified needs. If there are constraints on the ingredients or processing conditions —for example, if a particular combination of ingredients must sum to 100% —then these can be implemented easily by penalizing the fitness of nonconforming solutions.

Combining these technologies allows the development of useful and powerful methodologies. For example, using neural networks for modeling together with genetic algorithms for optimization (as is done within the INForm software system) allows a user to develop a formulation or process to meet stringent, often conflicting, objectives. New methodologies such as neurofuzzy logic (implemented in the FormRules system) have evolved. This combines the ability of neural networks to "learn" from data with fuzzy logic's capacity to express complex concepts simply, allowing a formulator and process engineer to gain an understanding of the underlying rules governing the formulation and process.

This article examines the applications of the software systems to formulation and processing using data taken from three examples in the literature.

The rules that govern roller compaction

The first case concerns a roller-compaction process for acetaminophen tablets using data published by Turkoglu et al. (3). Because acetaminophen has poor flow and compression characteristics, a prior agglomeration process is generally used. In Turkoglu's study, both the formulation and the process conditions were changed. In the formulation, the binder was one of three possibilities: hydroxypropyl methyl cellulose (HPMC, Methocel, Dow Chemical Co., Midland, MI), polyethylene glycol, or carbomer (Carbopol, Noveon, Cleveland, OH). The percentage of binder and the amount of microcrystalline cellulose (MCC) that was added were varied. One or two passes through the roller compactor was allowed, and 42 different experiments were measured, 30 of which were used to develop the models. These 30 experiments were used in FormRules, to determine which inputs were most important and to investigate how they affect the measured properties: crushing strength of the tablets, friability, ejection force, and disintegration time. With the default training parameters, good models (as assessed by an analysis of variance statistics, which all showed the value of R2 > 0.9) were obtained for each property.

Table I: Effect of binder type and binder addition on ejection force and crushing strength.

The models showed that both ejection force and crushing strength of the tablets depended on all four input variables. In addition, the data mining discovered an interaction between the binder type and binder concentration for both properties, as illustrated for the ejection force in Figure 1. Rules for both properties are summarized in Table I, where it is clear that, regardless of concentration, ejection force is low when polyethylene glycol is used. Data mining has also highlighted that when Carbopol is used, the crushing strength is low, regardless of the concentration. Therefore, the data-mining exercise can provide useful pointers for the choice of binder, depending on the properties that are required of the finished tablet.

Figure 1: Graphical representation of model for ejection force.

The other submodels developed during data mining showed that both the ejection force and the crushing strength tend to increase when the number of compaction passes was increased from one to two and that increasing amounts of MCC will also increase the ejection force and crushing strength. These effects were less marked, however, and the major contribution to these two properties comes from the nature of the binder and its concentration. Models for the other properties showed that disintegration time did not depend significantly on the number of compaction passes, and the friability depended only on the binder type and binder concentration.


Modeling and optimization for spheronization

Hard gelatin capsules containing spheronized pellets are now a recognized delivery system for controlling or modifying the release of drugs in the body. Uniformity of pellet shape is essential both for coating and for subsequent filling into capsules. Baert et al. (4) investigated a binary mixture of Avicel PH 101 and water. Their work looked at how changing the Avicel:water ratio, the spheronization time, and the spheronization speed affects the yield and roundness of the spheronized particles. Forty-five different combinations of formulation and process conditions were investigated, by using three spheronizer speeds, five spheronization times, and three Avicel:water ratios. These data points were used in FormRules so that the key relationships could be extracted. The results showed that good models could be found for both roundness and yield. Had a poor model been obtained, it would indicate that an uncontrolled variable was affecting the results. Here, that was not the case, and all important variables had been measured.

Figure 2: Graphical representation of model for roundness.

In the published data, the roundness of the particles was defined as the ratio of the largest to the smallest diameter of the spheres (i.e., the aspect ratio), so that a low value of roundness represents an approximately spherical particle. Our data-mining study shows that roundness is determined by all three input variables: Avicel:water ratio, spheronization speed, and spheronization time. As Figure 2 shows, all three variables contribute to the same submodel. This model indicates that there is an interaction between the variables, and it can clearly be seen when the model is expressed in rule form. The rules governing roundness, which were mined automatically from the data, are summarized in Table II. Values in parentheses are the confidence levels that indicate how high or low a particular value is.

Table II: Effect of spheronization speed, spheronization time and avicel:water ratio on pellet roundness.

A full examination of the rules shows that when the spheronization time lies in the mid-to-high range of examined values, then the spheronization speed has its greatest influence when the Avicel:water ratio is high. If the spheronization speed is low and the Avicel:water ratio is high, the particles will be markedly aspherical. If the Avicel:water ratio is low (i.e., the water content is high), however, then the spheronization speed has only a small effect. Different behavior is observed when the spheronization time is short. In that case, the spheronization speed is important for all values of Avicel:water ratio. In this way, the data mining exercise highlights which inputs must be controlled most closely and also points out which combination of conditions will lead to the most spherical shapes.

The data mining exercise produces even more interesting results for the yield, because it was found that spheronization time did not contribute significantly to the model. There is an interaction between the spheronization speed and the Avicel:water ratio, as shown in the three-dimensional plot shown in Figure 3. Lower Avicel:water ratios and lower speeds led to the highest yields.

Figure 3: Contribution of the ratio of Avicel to water and spheronization speed to yield.

Using the INForm software system, it is also possible to determine the process conditions and formulation that lead to optimum yield and sphericity. The aim of the optimization is to produce a high yield of particles that have a roundness value close to one. Supplementary to this is the assumption that if spheronization time could be decreased without sacrificing yield or sphericity, then processing could be sped up. Within the INForm software system, the objectives could be assigned a relative importance. In this case, yield was assigned a value of 8, and roundness was taken to have importance of 10 (on a 0–10 scale). Conditions leading to a roundness value >1.05 were assumed to be unacceptable when setting up the fitness criterion for the genetic algorithm optimization.

These optimization objectives could be achieved using a formulation containing 42.5% Avicel at a spheronization speed of 950 rpm and a time of 30 min. When the spheronization speed was fixed at only 5 min, reasonable results could still be achieved using a similar formulation with a spheronization speed of 870 rpm, so that yield and roundness did not need to be sacrificed significantly when the spheronization time is reduced.


Optimizing tablet-coating processes

Another process that has been investigated using neural computing is tablet coating. Mitchell looked at the effect of changing batch size, drum speed, spray rate, atomizing air pressure, inlet air temperature, inlet air relative humidity, exhaust temperature, and bed temperature on the color and color variation of a tablet coating using a coater (HCT-20 Mini Hi-Coater, Vector Corp., Marion, IA) for which 49 different experiments were carried out (5). As she noted, the inlet air relative humidity, the exhaust temperature, and the bed temperature could not be controlled with the HCT-20 machine. It was clear from her results, however, that the tablet-bed temperature had a significant effect on the properties of the finished coating, because leaving this parameter out of the models gave a considerably poorer fit to the experimental data. The tablet-bed temperature depends on the inlet-air temperature and on the volumetric airflow rate, but there is limited control over these factors in the HCT-20.

Mitchell's data have been examined using neural networks (embodied in the INForm software package) using 45 experiments for training and the remaining 4 for model validation. Using all eight input variables, good models, as assessed by analysis of variance statistics, could be obtained for all properties except gloss. Reasonably good models could be obtained when only the five directly controllable variables were used.

Mitchell also reported an optimization of the process conditions using the neural network models. The optimization was performed using genetic algorithms to evolve the fittest solutions, where the criterion of fitness involved fuzzy logic to specify the process goals. The aim was to have the color around 80, the color variation as low as possible to indicate a uniform film. It is possible within the INForm software system to select which of the properties is the most important by weighting the properties on a 0–10 scale, and Mitchell chose to treat color and color variation as equally important. She also allowed all the input variables, including the batch size, to participate in the optimization. This gave a set of optimization conditions for the HCT-20 coater, suggesting that the optimum batch size was 309 g, with drum speed of 11 rpm, and spray rate of 2 g/min.

Table III: Optimum process conditions for the HCT-20 coater.

Nonetheless, it may be more convenient to assume a specific batch size and to determine the best operating conditions for that. From the optimization performed here for various batch sizes, the best conditions are summarized in Table III. At larger batch sizes, the optimum values for drum speed and atomizing air pressure are at the low end of the experimentally used range, thereby suggesting that a further decrease, if feasible, would be beneficial.

The ability to fix specific input parameters and to specify the relative importance of each of the properties, gives a high degree of control over the optimization, allowing a formulator or process engineer to determine exactly the best operating conditions for their equipment. A sensitivity analysis function is built into the software system so that the robustness of the process to small variations in the process conditions can be assessed.

Benefits in process development: better products faster

Neural computing and advanced computing systems have a proven track record in pharmaceutical formulation (1, 2). Their implementations in INForm and FormRules allow formulators and process engineers to extract maximum value from their formulation and processing data, thereby improving both the efficiency and the effectiveness of their product development cycle. Using these approaches, key relationships can be discovered rapidly, expressed succinctly, and communicated clearly. Because the methods are entirely data-driven, they can be applied to many formulation and processing problems. The only requirement is the availability of data of reasonable quality and quantity.

Forward-thinking pharmaceutical companies are already adopting these technologies as part of a decision to support tool kits for their product formulators and process engineers, and all indications are that such advanced computing techniques will be used routinely in the future.

Elizabeth A. Colbourn* is the product director at Intelligensys Ltd., Springboard Business Centre, Ellerbeck Way, Stokesley, TS9 5JZ, UK, C. Rowe is the chief scientist, Intelligensys Ltd, Springboard Business Centre, Ellerbeck Way, Stokesley TS9 5JZ, UK.

*To whom all correspondence should be addressed.

Keywords: advanced computing, data mining, formulation, fuzzy logic, genetic algorithms, neural networks, software


1. E.A. Colbourn and R.C. Rowe, "Neural Computing Boosts Formulation Productivity," IT Innovations supplement to Pharm. Technol. 22–25 (2003)

2. E.A. Colbourn and R.C. Rowe, "Neural Computing and Formulation Optimization," in Encyclopedia of Pharmaceutical Technology (Marcel Dekker, New York, NY, 2005).

3. M. Turkoglu et al., "Modelling of a Roller Compaction Process using Neural Networks and Genetic Algorithms," Eur. J. Pharm. Biopharm. 48, 239–245 (1999).

4. L. Baert et al., "Studies of Parameters Important in the Spheronization Process," Int. J. Pharm. 96, 225–229 (1993).

5. K. Mitchell, "The Scale-Up, Modelling, and Optimisation of Aqueous Film Coating Processes," PhD Thesis, University of Bradford, UK (2003).