A recently developed deep neural network may present a solution to growing concerns over antibiotic resistance.
by Sophia Evanisko
Health organizations around the world have voiced concern over growing antibiotic resistance due to their misuse in both humans and animals, citing it as one of today's biggest threats to global health. Without any alternative the number of severe, often-fatal infections the world faces, as well as mortality rates, will skyrocket as antibiotics continue to become less effective. Now, a promising new machine-learning model developed by GMU researchers is working towards identifying possible antimicrobial activity in natural peptide sequences, which could serve as new drug templates in the place of our current, increasingly ineffective antibiotics.
The team developed a Deep Neural Network (DNN), or a multi-layer computer system modeled on the human brain, and a series of search methods to find and identify the naturally-occurring antimicrobial peptides, or AMPs, that appear to be promising candidates for serving as the new templates. The sophisticated DNN model used in the experiment uses complex mathematical modeling to evaluate each peptide sequence, and associates it with a score that represents the peptide's probability of having antimicrobial qualities (PDNN). It utilizes multiple layers to analyze each AMP, two of the most important ones being the convolutional and recurrent layer. The convolutional layer allows the model to recognize position-invariant patterns along the length of the amino acid sequence of the peptide, while the recurrent layer allows for recognizing and forgetting gap-sequence patterns.
This promising Deep Neural Network was developed by the second author Daniel Veltri, and the series of methods were developed by Manpriya Dua, a PhD candidate at George Mason University. Two others from the National Institute of Health, as well as the George Mason departments of Chemistry and Computer Science also aided in the research.
The team aimed to identify as many possible AMP’s as they could, a massive task since lengths of AMP’s range from a series of very few to nearly a hundred amino acids, consisting of combinations of the 20 naturally-occurring amino acids. To reach their goal, they developed four different search methods, comparing their ability to identify regions of novel AMP’s in a high-dimensional amino acid sequence space.
The first possible method the model uses is a randomized global search, in which each peptide sequence is sampled at random. First, a random length for the peptide sequence is selected from a specified range of values. Next, each position in the peptide sequence is sampled uniformly over each of the known amino acid identities randomly. Finally, each sequence is scored by the DNN, resulting in the probability of it containing antimicrobial activity.
The second method is a search that combines the procedure of the first method with the use of a greedy local improvement operator. The procedure is an improved version of the randomized global search’s. First, it generates a random sequence, similar to in the first method. Next, it randomly selects one position in the sequence and replaces the amino acid there with another randomly chosen amino acid. It compares the new sequence’s DNN predicted score with its old score, and if the new score is higher the new amino acid replaces the old amino acid; otherwise the old sequence is retained. If the new score has higher than a 50% probability that it is an AMP then the sequence is accepted.
The third method is another hybrid search that utilizes a Metropolis Monte Carlo local search instead of the greedy local improvement operator, because the greedy search will often prematurely convert to a local minimum when in the vicinity of a starting sequence. First, similar to the other two methods, a random sequence is generated and scored. Next, the search will move and replace individual peptides in the sequence and score the new peptide. If the movement results in a higher probability then the new peptide is accepted and the search continues to make individual moves. Otherwise, the new peptide is scored by using a selected acceptance probability.
The final, most advanced method is a hybrid search with a Simulated Annealing Metropolis Monte Carlo search. Whether the Metropolis Monte Carlo search used in the third method is better than the greedy local improvement operator used in the second method is typically dependent on temperature, and this method takes that into account. It begins the search with a high temperature and gradually lowers it to focus on exploring the near-optimal regions of space.
The experiment showed promising results: the fourth method, which used a Simulated Annealing Metropolis Monte Carlo search was able to generate 80% AMP’s, with the other methods not far behind. The team is eager to continue their work in developing new methods, in hopes to make even more progress in discovering these useful new peptide sequences.