How a radical idea that molecules intercommunicate at long distances can speed up drug discovery and cut costs Dr. Mae-Wan Ho
“While biotech offers significant promise in treating entire categories of disease for which no medicine previously existed, it comes at a significant cost,” said Tufts Centre for the Study of Drug Development (CSDD) director of Economic Analysis Josepha DiMasi . CSDD estimates that on average, it take US $1.2 billion to develop a new biotechnology product, and reflects the costs of drugs that fail in testing and the length of time required to bring a new biopharmaceutical to market. A new biotech product took 97.7 months on average to wend its way through clinical development and regulatory review, about 8 percent longer than for pharmaceuticals.
But it is the process of discovering drugs in the first place that has proved increasingly difficult and costly. The head of science and technology at Eli Lilly, Steven Paul, warned that the cost of producing a successful drug could top $2 billion by 2010 unless the pharmaceutical industry can identify new and better ways to improve efficiency and effectiveness of drug discovery and clinical trials .
Success rates to market varies from 12 to 33 percent , and total R& D costs are increasing at an annual rate of 7.4 percent above the general price inflation .
The pre-clinical R & D cost is 37.3 percent of total, and here, computer-aided design techniques are becoming increasingly important, holding out hope of reducing costs and speeding up new drug development. But that too has become more complex and time consuming owing to the proliferation of data, both real and virtual.
The cornerstones of drug discover are combinatorial chemistry and high throughput screening . Combinatorial chemistry is the systematic synthesis of large numbers of chemical compounds by combining sets of building blocks. A combinatorial robotic system can produce 100 000 or more compounds a year compared to the 100 that a traditional chemist can make. Combinatorial chemistry also involves creating ‘virtual libraries' of possible compounds with different structures, out of which researchers will select a subset for actual synthesis based on various calculations of structural and electronic properties.
In order to deal with the proliferation of drug candidates, high throughput automated screening is developed, in which batches of compounds are tested for binding activity or biological activity against target molecules. As in combinatorial chemistry, virtual screening for potential drugs could be done with sophisticated computer-mathematical approaches. High throughput and virtual screening are complementary and various statistical, informatics and filtering methods have been introduced to integrate experimental and in silico screening.
In 2002, it was anticipated that screening one million compounds per target would soon become the gold standard for the major pharmaceutical companies. The 96-well plates had been largely replaced by 384-well microplates and screening robots fully adapted to desktop environments.
Many tools have been developed for virtual screening. A ‘pharmacophore' is the spatial arrangement of chemical groups or features in a molecule known or thought to determine its activity. The most popular pharmacophore models consist of thee or four points separated by defined distance ranges. In most cases, pharmacophore geometry are not known from experiment, but predicted. ‘Similarity searching' involves using 2D or 3D pharmacophore models to identify potential drug candidates.
A ‘molecular graph' is a two-dimensional representation of the connectivity pattern in a molecule with atoms shown as vertices and bonds as edges.
‘Quantitative structure-activity relationship' (QSAR) refer to methods that relate structural features of molecules to biological activity in quantitative terms. In most cases, QSAR tries to establish linear relationships between selected structural features in a series of related molecules and their known level of activity. If successful, models derived from the training set can be applied to predict molecules with higher potency.
Various filtering procedures are applied to remove molecules with reactive or toxic groups, or for aqueous solubility. For example, ‘log P', log of the partition coefficient in octanol relative to water (which measures its solubility in oil compared to water) is frequently used. Another filter is ‘Lipinsky's rule-of-five', which states that candidates compounds are likely to have unfavourable absorption, permeation and bioavailability characteristics if they contain more than 5 H-bond donors, more than 10 H-bond acceptors, a log P greater than 5, and/or a molecular mass of more than 500 Da.
When many properties are input into the search, the output can be quite bewildering, and this calls for further mathematical tools. ‘Principal component analysis', transforms correlated variables into a smaller number of uncorrelated ones. ‘Genetic algorithm' is a problem solving trial and error approach that recombines parameters to arrive at the best solution. ‘Neural Networks' are analytical techniques modelled after the hypothesized process of learning in the human brain, and capable of recognizing a candidate ‘hit' after a process of training on existing data.
Obviously, the more tools used in virtual screening, the more computer time it takes, and the less transparent the results become. Knowing how molecules recognize each other at a distance could cut costs and speed up drug discovery
A key factor in both drug design and protein engineering is in understanding how molecules recognize each other. Here, the static ‘lock and key' model or ‘induced fit' model based on short-range interactions between molecules still serve as the basis of different computer ‘docking tools' currently used, usually as a final check on candidate drugs.
Veljko Veljkovic, who heads the Centre for Multidisciplinary Research Institute of Nuclear Sciences, Belgrade, Serbia, has devised a model based on long range interactions that promises to drastically cut the costs and speed up drug discovery and design.
It started back in the 1970s, when Veljkovic had already proposed that the long- range properties of biological molecules depend on just two measures: the average number of valence electrons (electrons that can engage in chemical bonds), referred to as the average quasivalence number (AQVN), and the electron-ion interaction potential (EIIP), which gives the energy level of the electrons. Soon, he was able to demonstrate a strong connection between the precise values of EIIP and AQVN of organic molecules and their biological activity, whether they are mutagens, carcinogens, or have toxic, antibiotic or cytostatic (anti-cancer) activity, etc [6, 7].
The values of AQVN and EIIP do not depend on the molecular structure; that means molecules differing widely in molecular structure may nonetheless share the same values, and this could be very important in discovering whole new classes of drugs.
Veljkovic and his ex-graduate student Irena Cosic now at RMIT University in Melbourne Australia, have shown how sequences of amino acids in proteins or bases in DNA could be represented in a linear array of the EIIP values and subjected to signal processing analysis to extract the ‘bioinformation' encoded  ( The Real Bioinformatics Revolution , SiS 33). That is proving invaluable in identifying the functions of proteins and genes, as well as in protein engineering and peptide drug design. Here, we shall see how the method can help in discovering drugs, using anti-HIV drugs as an example.
Approx. 76 percent of HIV+ patients with a measurable viral load are infected with a strain of the virus that is resistant to one or more classes of antiretroviral agents, and a new generation of antiviral drugs intended to counter HIV-1 entry into susceptible cells is now being developed. The HIV-1 co-receptors are particularly attractive targets, in particular CCR5, which is essential for viral transmission and replication during the early clinically latent phase and also during late stage disease.
To establish an AQVN/EIIP criterion for selecting HIV-1 entry inhibitors, Veljkovic and colleagues carried out a virtual (computer) screening of molecular libraries to identify a ‘training set', following the guidelines proposed. The guidelines stipulate a minimum selection of 16 diverse compounds, to avoid chance correlation, with a range of activity spanning 4-5 orders of magnitude, that give clear and concise information without redundancy or bias in terms of structural features and activity range; the selection should include the most active compounds to provide information on the most critical features required as a drug and avoid any compound known to be inactive because of steric hindrance (shape incompatibility).
In the training set selected, 82.4 percent of the compounds, and 90 percent of the most active CCR5 inhibitors have EIIP of 0.079 – 0.099 Ry and AQVN value within intervals of 2.42-2.63. Naturally occurring compounds form a homogeneous set encompassing EIIP values between 0 and 0.13 Ry and AQVN values in interval 2.0-3.8.
In order to validate the proposed criterion, it was applied in virtual screening of the NIH molecular libraries of CCR5 inhibitors and HIV-1 integrase inhibitors. It turned out that 65.61 percent of compounds from the NIH library of CCR5 inhibitors, used as positive controls, fit the proposed EIIP/AQVN criterion, while only 2.67 percent of compounds from the NIH HIV-1 integrase library, used as the negative controls, have EIIP within the range of 0.079-0.099 Rydberg and AQVN between 2.42 and 2.63.
With CCR5 inhibitors that are in clinical trials, four of five of the compounds fit the criterion.
For further validation of the criterion, the large PubChem Substances Database (NCBI) and ChemBank Small Molecules Bioactive Database (NCI, NIH) encompassing natural biological active substances were screened. Of the 798 793 compounds in NCBI, 673 654 (84.33 percent) do not satisfy the criterion for CCR5 inhibitors. A similar set of results was obtained for compounds from NCI. These results show that EIIP/AQVM-based virtual screening of molecular libraries can significantly reduce the number of compounds that can be further subjected to the more sophisticated and time consuming methods .
An important advantage of EIIP/AQVN-based virtual screening is to avoid selecting candidate CCR5 inhibitors that despite having the appropriate structural features cannot effectively prevent HIV infection. There is a large series of piperidine- and piperazine potential CCR5 antagonists, and eliminating those that are likely to be inactive will mean less synthesis, less actual screening and further down the line, less clinical tests.
Veljkovic and colleagues showed that in a training set of 25 piperidine- and piperazine-compounds selected by another researcher to develop a predictive pharmocophore, 24 fit the EIIP/AQVM criterion.
Similarly, screening 73 compounds from the Interchim's piperidine database revealed that only 11 (15 percent) could be selected as candidate CCR5 antagonists, while the 473 (70 percent) of the 675 piperidines in the NIH library of CCR5 inhibitors were found to have the required criterion.
The EIIP/ISM method is so simple that it can screen about 50 000 compounds on an ordinary PC computer in 5 to 10 minutes. Veljkovic and colleagues suggest it can be applied as an initial prescreen to select candidates for more complex and time consuming methods that use many different descriptors, functional forms and methods, from simple linear equations through to multiplayer neural nets.
Another impressive demonstration of the EIIP/AQVM method is in identifying flavonoids that have anti HIV-1 activity.
Within the past ten years, many different classes of compounds have been reported to inhibit HIV-1 replication, and naturally occurring plant flavonoids are among them. There are huge molecular libraries of flavonoids, however, and it is important to discriminate between flavonoids that are active against HIV-1 from those that are not.
A research team in the University of São Paulo, Brazil, used a complex virtual screening algorithms based on quantum chemistry calculations and pattern recognition methods that include principal component analysis, hierarchical cluster analysis, stepwise discrimination analysis and the K-nearest neighbour analysis. The end result was to show that log P (the partition coefficient), molecular volume and electron affinity are the essential variables for discriminating anti-HIV-1 active and inactive flavonoid compounds .
Vejkovic and his team used the same training set as the research team in São Paulo, and calculated the EIIP and AQVN. All the active anti-HIV flavonoid compounds fell within the AQVN interval of 3.34 – 3.59 and the EIIP interval of 0.1100 and 0.1350, while all inactive flavonoids fall outside those intervals .
Having defined the interval for active anti-HIV-1 compounds, they applied the criterion to the set of 17 new compounds that were included in another laboratory's training set. The results showed that 6 out of 8 active and all inactive compounds were in agreement with the EIIP/AQVN criterion. The exceptions were identified as “false negatives”, so this indicates that the predictive capacity of the EIIP/AQVN criterion is about 90 percent.
Next, they analysed five flavonoids isolated from the leaves of Nelumbo nucifera and three flavonoids isolated from the Carribean sea grass Thalassia testudinum . All fell within the interval of active anti-HIV-1 compounds, as have been reported.
The Belgrade team then compared the predictive capacity of the EIIP/AQVN criterion with the more complex algorithm proposed by the São Paulo team. They performed a prediction study with a set of nine further flavonoids, numbered I to IX that the São Paulo team worked on. All were predicted as inactive except for compound VIII. The EIIP/AQVN criterion came up exactly with the same prediction.
More than 4 000 structurally unique flavonoids have been identified from plants, because of their high biological activity and low toxicity, they are favourite drug candidates, particularly as tests with fruit juices rich in flavonoids have given encouraging results in HIV+ individuals . USDA's new database lists 25 of the most commonly occurring flavonoids in food .
The Belgrade team calculated the EIIP and AQVN values for these flavonoids and predicted their possible anti-HIV activity. Eleven of 25 flavonoids are predicted as active, and according to the existing literature, 8 of the 11 predicted as active (kaempferol, myrecetin, luteolin, quercetin, (-)-epigallocatechin 3-gallate, theaflavn-3,3'-digallate, theaflavin-3'-gallate, theaflavin-3-gallate) have been reported as anti-HIV-1 compounds. And not one of the 14 predicted as inactive have been reported as an anti-HIV-1 compound in the literature.
Readers should use these results and USDA's database to find the foods that could have a beneficial effect for HIV patients. Five of the eight flavonoids identified as active against HIV-1 are found in green and black tea  ( Green Tea, The Elixir of Life? SiS 33).
These results show that the simple EIIP/AQVN criterion can be used to discriminate flavonoids that are active or inactive in inhibiting HIV infection, comparable to other more complex approaches. It is a powerful virtual screening tool that will drastically cut costs and speed up drug discovery. The basic idea that molecules recognize each other at long distances also has profound implications for biochemistry and medicine .
Article first published 09/02/07
Got something to say about this page? Comment