Skip to Main Content

Evaluating AI

Checklist Icon

Despite their impressive capabilities, generative AI models often produce content that is incorrect, misleading, or not directly based on their training data, a phenomenon sometimes referred to by experts as ‘hallucinations’ or fabrications.” – Laflamme & Bruneault, 2025, p. 496


AI hallucination occurs when an artificial intelligence system, like ChatGPT, produces information that is factually incorrect, made up, or non-sourced, even though it sounds confident and plausible.

For example, the AI might invent a quote, cite a non-existent academic article, or give an incorrect answer based on patterns rather than facts. Hallucinations happen because AI generates responses based on language patterns, not true understanding or verified data.

Unlike a search engine like Google, AI tools do not search the internet in real time or access live sources to find answers. They generate content based on patterns learned from past training data.

In academic work, using hallucinated content can lead to misinformation or violations of academic integrity, including plagiarism.

Don’t forget to double-check what you get from ChatGPT. Sometimes it sounds right but includes made-up details. This is called a “hallucination.”

Evaluating AI Tools & Output

🧠 Evaluating AI Outputs: Use these questions to help assess the reliability and usefulness of AI-generated content.
πŸ” Criteria ❓ Questions to Ask
πŸŽ“ Authority Does the AI cite a credible source or author for the information?
Can you verify the information independently from reliable sources?
βœ… Accuracy Is the information factually correct and complete?
Are there errors, invented citations (“hallucinations”), or outdated data?
βš–οΈ Objectivity Does the output show bias or lean toward a particular viewpoint?
Are assumptions or stereotypes present in the language or examples?
πŸ“… Date of Publication Is the information current or based on recent knowledge?
Is currency important for your topic? (AI tools may not be up-to-date.)
πŸ“š Coverage Does the output address the topic fully or only skim the surface?
Does it miss key perspectives or oversimplify complex issues?
🧰 Usefulness Is the content relevant to your research question or assignment?
Is the format helpful (e.g., clear structure, examples, explanations)?

 

When using artificial intelligence, it is important to evaluate the tool itself and the tool’s output critically. Ask yourself these questions:

  • What is the purpose of the tool?
  • How is this tool funded? Does the funding impact the credibility of the output?
  • What, if any, ethical concerns do you have about this tool? 
  • Does the tool asks you to upload existing content such as an image or paper? If so, are there copyright concerns? Is there a way to opt out of including your uploaded conent in the training corpus? 
  • What is the privacy policy? If you are assigning this tool in a class, be sure to consider any FERPA concerns. Faculty may also reach out to the Office of Academic Technology for guidance. 
  • What corpus or data was used to train the tool or is the tool accessing? Consider how comprehensive the data set is (for example, does it consider paywalled information like that in library databases and electronic journals?), if it is current enough for your needs, any bias in the data set, and algorithmic bias.
  • If reproducibility is important to your research, does the tool support it?
  • Is the information the tool creates or presents credible? Because generative AI generates content as well as or instead of returning search results, it is important to read across sources to determine credibility.
  • If any evidence is cited, are the citations real or "hallucinations" (made up citations - see the glossary).

Attribution