Q: Which GenAI should we (fully) trust? A: None!*

* Experiment-based analysis

Recently, I stumbled upon an interesting puzzle that got me thinking: How long would it take to empty a bucket of water if it lost half of its capacity every hour? The answer might surprise you.

And since many things in our lives are now connected to AI in one way or another, I decided to use the following simple prompt to see how Microsoft Copilot would respond to the following more focused, altered version of the original puzzle:

A bucket has 5 liters of water. 50% of the water is taken from it every hour. How many hours will it take until it reaches the smallest amount of water, which is one single H₂O molecule? Explain the answer in steps.

After a thorough, lengthy process and a step-by-step explanation, Copilot AI determined that the answer is 86.62 hours.

I was persuaded to tell you the truth by the detailed scientific analysis.

But since I enjoy experimenting and analyzing things, I wondered what would happen if I gave the other well-known generative AIs the same prompt. Since I assume that there should be no possibility of hallucinations when it comes to scientific facts, the logic suggests that they would all undoubtedly provide the same response.

However, I was disappointed to find that every GenAI I tested gave me an entirely different response, as you can see on this table! (I ignored adding the 90 hours of Gemini simply because it completely ignored my request to add how it reached its result.)

Could someone, ideally from the people who own those intelligent tools, explain the reason behind the stark differences between them?

I tested the exact prompt of this theoretical exercise on the free web versions of each of the AIs, so please do not suggest changing it or using the “more accurate” paid versions. This is because most people in the world are not tech-savvy, so they will simply rely on their preferred GenAI to get what they need with little effort or analysis or even think about it.

And now, the burning question remains: Which of the ever-evolving generative AI systems can we place our trust in?”

Solving puzzles like the one I mentioned here isn’t life-critical, but in fields where accuracy is essential, it’s advisable to double-check results using more than one AI tool and compare the findings with trusted sources. (You might ask, why not do this from the beginning without using an AI?) Some might suggest using a more powerful, paid version of an AI tool, but I’m not sure if the results would be consistent across different systems.

If you’re unable to implement any of the above solutions, I am sorry to say that you should be prepared to face the consequences of your decision. 🙂

https://www.linkedin.com/in/waleedalasfar/

Waleed AlAsfar

The information provided on this topic is not a substitute for professional advice, and you should consult with a qualified professional for specific advice that is tailored to your situation. While we strive to ensure the accuracy and timeliness of the information provided, we do not make any warranties or representations of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability of the information, products, services, or related graphics for any purpose. Any reliance you place on this information is at your own risk. We cannot be held liable for any consequences that may arise from the use of this information. It is always advisable to seek guidance from a qualified professional.

Q: Which GenAI should we (fully) trust? A: None!*

AI has helped in writing this article

The contributor chose to remain anonymous.

Waleed AlAsfar