Genomics, Medicine, and Pseudoscience: AI

Showing posts with label AI. Show all posts

AI is crying out for regulation, while virologists doing gain-of-function research take the opposite tack. Why?

By Steven Salzberg on 5/30/2023 04:30:00 AM

Over the past few months, prominent tech leaders have been raising alarms about the dangers of AI, and politicians are following suit. Just last week, the Senate held hearings on how to regulate AI. The tech industry itself is calling for regulation: just a few days ago, Microsoft’s CEO testified before Congress and asked the federal government “to govern AI at every part of its lifecycle.”

One of the founders of AI, Geoffrey Hinton, just left his high-level position at Google so that he could criticize AI without any constraints from his employer. And a couple of weeks ago, I asked the AI program ChatGPT if we should trust AI. No way, it told me.

This is all kind of surprising. AI experts seem to be saying “stop us before we do any harm.” It’s also kind of refreshing: usually the private sector wants the government to stay out of its affairs.

Now contrast all this with the behavior of virologists on a completely different technology: gain-of-function research on deadly pathogens. As I’ve explained before, gain-of-function (GoF) research takes a deadly pathogen, such as the influenza virus or the Covid-19 virus, and modifies it to make it even more deadly. Many scientists, including me, find this work both alarming and of little benefit, and we’ve been calling for it to be regulated for a decade now.

However, unlike AI experts, many virologists are opposed to any hint of regulation of their GoF work. On the contrary: just recently, 156 leading virologists jointly authored an opinion piece that lauded the many wonderful benefits of GoF, and pooh-poohed any risks.

Don’t worry your pretty little heads, these virologists seem to be saying to the rest of the world. We know what we’re doing, and it’s not that risky. Plus it’s great! Not to put too fine a point on it, but I disagree.

What’s caught my attention this week is not just the contrast in their willingness to be regulated, but the question of how one might imagine doing it, in both cases.

Simply defining what we mean by “AI” today is probably impossible. The number and types of programs that incorporate some form of artificial intelligence is vast and already affects our lives in many ways. The recent alarm bells were caused by one particular type of AI, known as large language models (LLMs), which have the ability to fool people in a new way. For several years now, more alarm bells have sounded (justifiably so) over “deep fakes,” images or videos that appear real but that are completely made up. These use completely different technology.

So even if we agree that AI needs to be reined in, no one can really say with any precision what that would mean.

Now let’s look at gain-of-function research on pathogens. One of the biggest objections that some virologists have made, on many occasions, is that there’s no way to define just the harmful research, so we really should just leave it all alone.

For example, the recent commentary by 156 virologists said that “gain-of-function approaches incorporate a large proportion of all research because they are a powerful genetic tool in the laboratory.” This is nonsense. It’s equivalent to saying “hey, this is science, and you don’t want to ban all science, do you?”

They also defend GoF by trotting out examples of research that were beneficial, such as the recent rapid development of Covid-19 vaccines. As was pointed out recently in the biology journal mBio, this is a red herring: it’s just not that difficult to define GoF “research of concern” and distinguish it from other, much more mundane virology and bacteriology research.

In fact, biologists have already done this, in a recent set of proposed new guidelines for regulating GoF research. As Hopkins researcher Tom Inglesby put it, “if you are going to make a more transmissible strain of Ebola, then you need to have the work reviewed by the U.S. government.”

So why do the AI scientists say “please regulate us” while many virologists say “leave our gain-of-function work alone”? It’s not because it’s too hard to define one or the other–if it were, the AI experts wouldn’t even consider regulation as a possibility.

No, it seems that it’s all about money. AI is thriving in both academia and industry, with tremendous growth ahead. The people calling for regulation just aren’t worried about money. They know that AI will continue to thrive, and they are calling for regulation because they seem to have genuine concerns about the threat that AI poses to society.

On the other hand, the world of gain-of-function research is very small, and almost entirely dependent on government funding. Although I’m sure they’ll deny it, these scientists are worried that they’ll lose their grants if even a small portion of GoF research is shut down. They may also be worried about more direct threats to their finances: the conflict-of-interest statement on that recent article by 156 virologists goes on for 731 words. (That is one of the longest conflict-of-interest statements I’ve ever seen on a scientific article.)

I decided to ask an AI (ChatGPT) these questions. When asked about regulating GoF, it replied with a long answer that concluded,

“Ultimately, the decision to regulate gain-of-function research involves weighing the potential risks and benefits. Striking the right balance requires collaboration between scientists, policymakers, and relevant stakeholders to establish guidelines, promote responsible research practices, and implement appropriate oversight mechanisms.”

ChatGPT’s answer about regulating AI was similar, concluding:

“Regulation can play a crucial role in ensuring that AI systems are developed and deployed responsibly... The specific nature and extent of regulation will likely depend on the application and level of risk associated with AI systems. Striking the right balance between regulation and fostering innovation is essential to ensure that AI technology benefits society while safeguarding against potential risks and ethical concerns.”

Overall, not bad advice. Now if only those virologists will listen.

Should we allow AI to control the battlefield? AI itself thinks not.

By Steven Salzberg on 5/08/2023 04:30:00 AM

Artificial Intelligence, or AI, seems to have finally arrived. With the introduction of ChatGPT last November, millions of people suddenly discovered that AI was far, far more than just a research activity. The range and sophistication of ChatGPT’s answers to questions across a wide range of disciplines is, frankly, pretty stunning.

AI is already in lots of places where you might not even realize it. Google Translate has been using AI for years, and it’s remarkably good, although nowhere near as good as a human translator. The technology that Pandora uses to customize your music is a type of AI, as is the technology behind Tesla’s self-driving cars. Within my own field, the program AlphaFold2 was a true breakthrough in scientists’ ability to predict the structure of proteins.

Along with these apparently beneficial developments, though, comes a great deal of concern. As AI gets better and better, can we continue to trust it to make decisions for us? It’s one thing if an AI recommends a song that we don’t like, which we can just ignore, but if AI is driving our cars, then mistakes might be much more costly.

And there are far riskier tasks that AI might be doing soon: what if the military starts using AI on the battlefield? One might be tempted to say no, of course they won’t do that, but what if another country starts using AI this way? Can we simply ignore that?

To be honest, I haven’t been worried, at least not yet. I have a little secret to confess: I did my Ph.D. thesis in AI, in the area called machine learning. I haven’t been doing AI research for the past 25 years, but I think I still understand at least a little of the basics behind the technology.

Thanks mostly to the attention that ChatGPT is getting, though, there’s suddenly a huge concern that AI represents a danger to society, although no one is exactly sure why. President Biden had a meeting just this week with the CEOs of Google, Microsoft, and OpenAI (the creators of ChatGPT) to talk about these concerns, where he told the execs that “What you’re doing has enormous potential and enormous danger.”

Of course, politicians have no idea what AI technology is–although I had to laugh at this quote from the White House press secretary: “The president has been extensively briefed on ChatGPT and knows how it works.” Um, no he doesn’t. I’m skeptical that the CEOs of Google and Microsoft know how it works either. However, you don’t need to understand the technology to understand the dangers.

Just recently, one of AI’s early pioneers, Geoffrey Hinton, has been speaking out to warn the public that AI has reached a point where “there’s a danger of something really bad happening.” Not surprisingly, other experts, such as German scientist Jürgen Schmidhuber, disagree.

The chair of the FTC, Lina Khan, says we can regulate AI effectively, and perhaps we can. But I’m not so sure.

Much of the concern in the press seems to be about the use of AI to spread misinformation. For example, AI can help create “deep fakes” that can be used to convince people that, say, an election was stolen even when it wasn’t. That’s a legitimate concern, but it’s happening even without AI.

A better question, which a family member just asked me, is should we be worried about AI making decisions on the battlefield? I thought I should take this question directly to the source, so I asked ChatGPT itself. The answer I got was remarkably clear and direct; I’ll quote it here in full:

“Yes, people should be concerned about AI making decisions on the battlefield. There are several reasons for this.

Firstly, AI lacks human empathy, judgment, and intuition. In a complex and dynamic environment such as the battlefield, human decision-making is often guided by experience, intuition, and an understanding of the broader context. AI lacks the ability to consider these intangible factors and may make decisions solely based on pre-programmed algorithms or data inputs, which could lead to unintended consequences or even catastrophic errors.

Secondly, the use of AI in military decision-making raises significant ethical concerns. For instance, an AI system may be programmed to prioritize the protection of friendly forces at the expense of civilian lives, leading to potentially devastating consequences for non-combatants caught in the crossfire. Additionally, AI systems may be susceptible to hacking or manipulation, which could result in unintended consequences or a loss of control over the decision-making process.

Finally, there is the risk that reliance on AI systems may lead to the erosion of human decision-making capabilities, potentially diminishing the overall quality of military decision-making and increasing the likelihood of errors or unintended consequences.”

So there you have it: ChatGPT itself gives us several reasons why it’s a terrible idea to let AI make decisions that might cost human lives. We should listen to what it’s telling us, because as it gets more intelligent (and it will), it might decide not to be so honest in the future. And I have to say that it’s reasoning here is both well-written and compelling.

One thing that worries me as well is that the very term “artificial intelligence” is much too broad. We should be concerned about giving any technology the power to control our lives, whatever someone calls it. But when an AI itself says “don’t trust me,” then I think we might want to listen.

No, Google didn't just create AI that could "build your genome"

By Steven Salzberg on 12/11/2017 04:30:00 AM

Most scientists don't have their own PR machine to hype their work. After reading the announcement last week from Google's genomics group, I realized that's probably a good thing.

A Wired article last Friday reported that "Google is giving away AI that can build your genome sequence." Sounds impressive–two high-tech innovations (AI and genomes) in the same title! Unfortunately, the truth is somewhat different. It turns out that Google's new "AI" software is little more than an incremental improvement over existing software, and it might be even less than that.

I'm going to have to get into the (technical) weeds a bit to explain this, but it's the only way to set the record straight. The Wired piece opens with this intriguing challenge:

"Today, a teaspoon of spit and a hundred bucks is all you need to get a snapshot of your DNA. But getting the full picture—all 3 billion base pairs of your genome—requires a much more laborious process."

Interesting, I thought. The writer (Megan Molteni) seems to be talking about genome assembly–the process of taking billions of tiny pieces of DNA sequence and putting them together to reconstruct whole chromosomes. This is something I've been working on for nearly 20 years, and it's a fascinating but very complex problem. (See our recent paper on the wheat genome, as one of dozens of examples I could cite.)

So does Google have a new genome assembly program, and is it based on some gee-whiz AI algorithm?

No. Not even close. Let's look at some of the ways that the Google announcement and the Wired article are misleading, over-hyped, or both.

1. The Google program doesn't assemble genomes. That's right: even though the Wired piece opens with the promise of "getting the full picture" of your genome, the new Google program, DeepVariant, doesn't do anything of the sort. DeepVariant is a program for identifying small mutations, mostly changes of a single letter (called SNPs). (It can find slightly larger changes too.) This is known as variant calling, or SNP calling, and it's been around for more than a decade. Lots of programs can do this, and most of them do it very well, with accuracy exceeding 99.9%.

How could Wired get this so wrong? Well, the Wired piece is based on a Google news release from a few days earlier, called "DeepVariant: Highly Accurate Genomes With Deep Neural Networks," written by the authors of the software itself. Those authors, who obviously know what their own software does, make the misleading statement that DeepVariant is

"a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods."

If you read on, though, you quickly learn that DeepVariant is just a variant caller (as the name implies). This software does not "reconstruct the true genome sequence." That's just wrong. To reconstruct the sequence, you would need to use a program called a genome assembler, a far more complex algorithm. (I should add that many genome assemblers have been developed, and it's an active and thriving area of research. But I digress.)

The Wired article also points out that

"the data produced by today’s [sequencing] machines still only produce incomplete, patchy, and glitch-riddled genomes."

Yes, that's true. Again, though, DeepVariant does nothing to fix this problem. It can't assemble a genome, and it can't improve the assembly of an "incomplete, patchy" genome.

2. Wild hyperbole: the caption on the lead image in the Wired piece says "Deep Variant is more accurate than all the existing methods out there." The Google press release, presumably the source for that caption, claims that DeepVariant has "significantly greater accuracy than previous classical methods."

No, it does not. This is the kind of claim you'd never get away with in a scientific paper, not unless you rigorously demonstrated your method was truly better than everything else. The Google team hasn't done that.

How good is it? First, let me remind you that variant calling programs have been around a long time, and they work very well. An incremental improvements would be nice, but not "transformative" or a "breakthrough"–words that the Google team didn't hesitate to use in their press release. They also used the word "significant," which they'd never get away with in a scientific paper, not without statistics to back it up. Press releases can throw around dramatic claims like these without anyone to check them. That's not a good thing.

About a year ago, the Google team released a preprint on bioRxiv that shows that their method is more accurate (on a limited data set) than an earlier method called GATK, which was developed by the same author, Mark DePristo, in his former job at MIT, which he left to join Google. GATK is quite good, and is very widely used, but other, newer methods are much faster and (at least sometimes) more accurate. The Google team basically ignored all of the other variant calling programs, so we just don't know if DeepVariant is better or worse than all of them. If they want to get this preprint published in a peer-reviewed journal, they're going to have to make a much better case.

(As an aside: a much-less hyped method called 16GT, published earlier this year by a former member of my lab, Ruibang Luo, is far faster than DeepVariant, just as accurate, and runs on commodity hardware, unlike DeepVariant which requires special resources only available in the Google Cloud. And it does all this with math and statistics–no AI required. But I digress.)

(Another aside: if we really wanted to get into the weeds, I would explain here that the "AI" solution in DeepVariant is transformation of the variant calling problem into an image recognition problem. The program then uses a method called deep neural networks to solve it. I have serious reservations about this approach, but suffice it to say that there's no particular reason why treating the problem as an image recognition task would provide a large boost over existing methods.)

3. More wild hyperbole. The Google news release opens with a sentence containing this:

"in the field of genomics, major breakthroughs have often resulted from new technologies."

It then goes on to describe several true breakthroughs in DNA sequencing technology, such as Sanger sequencing and microarrays, none of which had any contribution from the Google team. Then–pause for a deep breath and a paragraph break–we learn that "today, we announce the open source release of DeepVariant." Ta-da!

I can only shake my head in wonder. Does the Google team truly believe that DeepVariant is a breakthrough on a par with Sanger sequencing, which won Fred Sanger the 1980 Nobel Prize in Chemistry? This is breathtakingly arrogant.

4. DeepVariant is computationally inefficient. Even if it is better than earlier programs (and I'm not convinced of that), DeepVariant is far slower. While other programs run on commodity hardware, it appears that Google's DeepVariant requires a large, dedicated grid of computers working in parallel. The Wired article explains that two companies (DNAnexus and DNAStack) had to invest in new GPU-based computer hardware in order to run DeepVariant. An independent evaluation found that DeepVariant was 10 to 15 times slower than the competition. Coincidentally, perhaps, Google's press release also announces the availability of the Google Cloud Platform for those who want to run DeepVariant.

No thanks. My lab will continue to use 16GT, or Samtools, or other variant callers that do the job much faster, and just as well, without the need for the Google Cloud. As a colleague remarked on Twitter, the "magic pixie dust of 'deep learning' and 'google'" doesn't necessarily make something better.

Genomics is indeed making great progress, and although I applaud Google for dedicating some of its own scientific efforts to genomics, it's not helpful to exaggerate what they've done so far, especially when they take it to this level. Both the Google news release and the Wired article contain the sort of over-statements that make the public distrust science reporting. We don't need to do that to get people excited about science.