The GPT-5 model in scientific research: uses, advances, and limitations

Last update: January 29, 2026
  • GPT-5 and GPT-5.2 improve scientific and mathematical reasoning, with leading results in benchmarks such as GPQA Diamond and FrontierMath.
  • Models act as research co-pilots: they help solve open problems, optimize experiments, and analyze literature, but they require human verification.
  • Its adoption extends to medicine, wet laboratories, universities and businesses, boosting productivity but posing ethical, safety and regulatory challenges.

GPT-5 model in scientific research

The jump of GPT-5 and GPT-5.2 In scientific research, it is redefining the way science is done.From the most theoretical mathematics to wet lab experiments, and including biology, physics, medicine, and advanced materials science, these models don't just write texts; they have begun to be used as true research co-pilots, capable of suggesting hypotheses, helping to design experiments, and finding patterns in data that would take a person months to identify.

At the same time, OpenAI And the rest of the scientific ecosystem are very clear on one key pointGPT-5 is not an “autonomous scientist” nor a substitute for the human scientific method. It functions more as an assistant with immense access to literature, quantitative tools, and structured reasoning capabilities, which can accelerate work, but still requires expert supervision, verification, and considerable critical judgment from researchers.

GPT-5 and GPT-5.2: New generations of models for science and mathematics

OpenAI has set December 11, 2025 as the key date for the official presentation of GPT-5.2The version it describes as its most advanced model to date for scientific and mathematical tasks. Over the past year, the company has collaborated closely with researchers in fields such as mathematics, physics, biology, and computer science to gain a practical understanding of where AI delivers real value and where it still falls short.

This work has crystallized into case studies that span very different disciplinesFrom astronomy to materials science, GPT-5 and, later, GPT-5.2 have played a role in specific parts of the research workflow: redesigning demonstrations, exploring alternative testing methods, revising simulation code, synthesizing articles, and proposing minor protocol variations. According to OpenAI, GPT-5.2 is beginning to show improvements that are not only occasional but also more stable and reproducible.

Within the GPT-5.2 family, two specialized variants for science and mathematics stand out: GPT-5.2 Pro and GPT-5.2 Thinking.Both have been optimized for deep reasoning and demanding technical tasks, where a subtle error can ruin an entire analysis. GPT-5.2 Pro prioritizes fidelity and accuracy, allowing for more reasoning time, while GPT-5.2 Thinking focuses on intelligently deciding when to "think" more and when to respond more quickly.

This philosophy of “step-by-step reasoning” was already present in the design of GPT-5 with the mode GPT-5 ThinkingIt acts as an internal router capable of evaluating the complexity of a query, the available context, and the necessary tools (e.g., access to Python) before producing a response. It responds quickly to simple questions; for complex problems, it activates longer and more explicit reasoning chains.

In day-to-day use, users can choose between several GPT-5 reasoning modes“Auto” lets the model decide how much time to spend on the problem; “Instant” prioritizes speed over depth; “Thinking” provides more considered and analytical answers; and “Pro” is the most accurate and demanding variant, designed for tasks where accuracy takes precedence over speed. It's worth noting that GPT-5 is a paid model, accessible through subscription or pay-per-use, which is especially relevant for institutions managing sensitive data or those with tight research budgets.

Performance in benchmarks: GPQA, FrontierMath and FrontierScience

The improvement of GPT-5.2 in scientific research is not based solely on subjective impressions, but also on results in specialized benchmarks.One of the most cited is GPQA Diamond, a set of multiple-choice questions at the postgraduate level that covers physics, chemistry, and biology, designed to measure advanced reasoning and not just memorization.

In GPQA Diamond, GPT-5.2 Pro achieved a 93,2% success rate and GPT-5.2 Thinking a 92,4% success rate.Working without external tools and with its reasoning effort set to maximum, the model has to solve problems "on its own," solely through its internal analytical capabilities. These figures clearly place it above previous generations and solidify its role as an assistant in very high-level problem-solving and comprehension tasks.

Another benchmark test is FrontierMath (Tier 1-3)This is an advanced mathematics assessment that allows the use of a Python tool. In this scenario, GPT-5.2 Thinking solves 40,3% of the problems with maximum reasoning effort, percentages that, although they may seem modest to the layperson, represent a significant leap forward in an area where most previous models barely achieved useful results.

  What is Suno AI and how does this AI song creation tool work?

Beyond the numbers, OpenAI insists that these advances reflect an improvement in the overall capacity for abstraction and reasoning.not merely a narrow skill optimized for a single benchmark. They directly relate these capabilities to everyday workflows in science: programming simulations, statistical data analysis, designing and refining experiments, or interpreting results.

In parallel, OpenAI has introduced a broader framework called FrontierScience.Designed to evaluate the performance of models like GPT-5 on genuinely novel scientific problems that are not part of the training data, FrontierScience includes challenges in biology, chemistry, physics, mathematics, computer science, and social sciences, designed to demand not only theoretical knowledge but also planning, critical thinking, and generalization.

Initial analyses show that GPT-5 performs very well when the task can be broken down into clear, logical steps.While it continues to struggle when asked for creative intuition or a deep understanding of the experimental context, this aligns with the increasingly widespread view among AI experts: current generative models are powerful support tools, but they do not replace the creativity, intuition, or responsibility of the human scientist.

An emblematic case: solving open problems in mathematics

One of the most striking examples of the use of these models in pure science is the case of statistical learning theory, where GPT-5.2 Pro helped to resolve an open problem. related to the monotonicity of learning curves for maximum likelihood estimators. The underlying question is intuitive: when we add more data to a properly specified statistical model, should the expected error always decrease, or might it worsen, at least in some segments?

Previous research had shown that, under certain practical conditions, the learning curve is not always monotonous. And that, when adding data, the error can increase counterintuitively. This line of research dates back to a problem raised in 2019 at the Conference on Learning Theory (COLT) by Viering, Mey, and Loog, which triggered numerous subsequent articles with concrete examples and strategies for recovering monotonicity.

Despite these advances, one standard case, considered almost "textbook," remained to be solved.A Gaussian model with a known mean and unknown standard deviation, where the statistical model is correct and the data follow an idealized normal distribution. In this classic scenario, the new work concludes that traditional intuition holds true and that more data does indeed imply a predictably decreasing mean error.

The key difference of the study, as OpenAI explains, lies not only in the result, but in the processInstead of guiding the model step-by-step with a detailed proof scheme, the authors directly presented the open problem to GPT-5.2 Pro and meticulously analyzed the proof it generated. They then validated the argument with external experts in the field, thoroughly reviewed each step, and, once consolidated, used the model to extend the result to higher dimensions and other common statistical models.

This approach aptly illustrates the type of emerging collaboration between humans and AI in theoretical researchThe model suggests possible testing paths, while humans act as rigorous referees, correcting, refining, and deciding what is accepted as a valid contribution. There is no blind delegation, but rather a combination of automated exploration and expert scrutiny.

GPT-5 as research co-pilot: from Erdős number to the wet lab

Beyond theoretical statistics, GPT-5 has been featured in other high-profile use casesOpenAI, for example, has published a paper in which its model helps solve a complex open problem in number theory related to Erdős's legacy, in collaboration with a mathematician from Columbia University. The model helped explore conjectures, verify intermediate steps, and propose alternative approaches that proved fruitful.

Another example that has attracted a lot of attention is the identification of a specific change in human immune cells in a matter of minutes.This was a task that had consumed months of effort for a team of scientists. GPT-5 proposed a specific experiment to test a hypothesis about this change; the researchers replicated the experiment and confirmed that the suggestion was correct, thus significantly shortening the usual trial-and-error cycle.

These results are part of a broader movement by the technology industry towards the scientific sector. anthropicFor example, Google has announced the integration of its chatbot Claude into tools used by research groups and life sciences companies. Google has also introduced a "co-scientist" designed to formulate new hypotheses and highlighted that its open-source Gemma model contributed to the discovery of a potential new avenue for cancer therapies.

  Security risks in browsers with AI agents

OpenAI, for its part, has created a specific scientific unit and has incorporated figures such as Alex Lupsasca, known for his theoretical work on black holes.Among the company's plans is to develop a kind of "automated AI research intern" in the short term and, looking further ahead, a virtually automated research tool within a few years, always under the premise of keeping the human researcher at the center of the process.

In the wet lab, GPT-5 and its successors have been tested as assistants to optimize experimental protocolsBased on relevant literature and previous data, the model can suggest temperature conditions, incubation times, reagent dosages, or combinations of controls and replicates. In several reported cases, small adjustments suggested by the model have improved the performance of chemical reactions or significantly reduced the time required to obtain useful results.

Use of GPT-5 in medicine and clinical practice

One of the fields where GPT-5 is showing a very tangible practical impact is medicine.This applies both to clinical practice and clinical research. To begin with, the model has become established as a tool for analyzing complex clinical reports (laboratory tests, imaging studies, postoperative reports, etc.), generating condensed summaries with key findings that save professionals time.

The procedure is simple: the doctor or researcher enters the text of the report or an image of the document and requests a summary or the extraction of the most relevant points.GPT-5 returns a summary report highlighting possible diagnoses, critical findings, and follow-up recommendations. However, this is always under the premise that the healthcare professional must review and validate the information before making any decisions.

Another powerful application is the generation of high-quality medical contentFrom clinical summaries to drafts of scientific articles or informational materials for patients. Starting with a few instructions in natural language (for example, "write a summary about a patient with persistent fever and myalgia"), the model produces coherent and well-structured texts that professionals can edit and adapt to their needs. High-quality medical content AI-generated text can speed up writing, always with human review.

GPT-5 can also suggest differential diagnoses based on symptoms and history described by the practitioner.It does not replace clinical judgment, but it offers a reasoned list of possibilities, complementary tests to consider, or red flags that should be ruled out. In cases such as a 50-year-old patient with fatigue, dry cough, and shortness of breath, the system can list probable diagnoses and suggest studies such as chest X-rays, blood tests, pulmonary function tests, or viral tests.

In terms of personalized care, GPT-5 helps to adjust treatment plans and prevention strategies. Depending on the patient's profile, provided that the data is entered anonymously and with strict respect for privacy. For a 70-year-old patient with hypertension, type 2 diabetes, and chronic kidney disease, for example, the model can list integrated management strategies, risk factor control, lifestyle recommendations, and long-term follow-up guidelines based on clinical practice guidelines.

Finally, GPT-5 is being used as an intelligent search engine for medical literature.The professional poses a question in natural language (“what recent studies are there on telemedicine in chronic diseases?”) and the model locates and summarizes relevant works, helping to stay up to date without having to manually delve into endless databases. Search engines and tools like NotebookLM They facilitate the organization and summarization of literature for professionals.

Quality of responses, hallucinations, and safety

A recurring criticism of previous generations of models, such as the O3 and O3-Pro, has been their tendency towards hallucinations.Citing real articles but drawing erroneous conclusions or incorrect extrapolations from them. Researchers in polymers for materials science or in biological signaling pathways have reported that GPT-5 clearly improves this behavior, citing more relevant literature and offering interpretations better aligned with the original texts.

The OpenAI technical paper indicates that GPT-5 significantly reduces factual errors compared to GPT-4 and its own model o3especially when deep reasoning mode is activated. In controlled environments, a reduction of approximately 45% compared to GPT-4 and up to 80% compared to o3 is reported in certain tasks, thanks to a combination of improved training, internal verification techniques, and more careful design of security policies.

  How to use Google Gemini in daily life to be more productive

Even so, OpenAI's own article acknowledges that GPT-5 continues to make incorrect assumptions or fabricate data.even when it seems very certain. That's why they insist, like many academics, that every deviation from the model should be treated as a hypothesis to be tested, not as an absolute truth. In scientific research, where reproducibility and verifiability are sacrosanct, this distinction is fundamental.

The issue of safety goes beyond technical and scientific accuracy.Access to powerful models like GPT-5 could, without adequate controls, facilitate the dissemination of sensitive knowledge in biosafety, hazardous chemicals, and other sensitive fields. This has led to an international debate on models for controlled access, log logging and auditing, request traceability, and multi-level security filters. Tools such as extensions to identify AI content They are part of the mitigation ecosystem.

Organizations that use GPT-5 for research should coordinate with legal teams, data protection officers, and ethics committees.Positions such as legal specialists in healthcare institutions and data protection officers play a central role in ensuring compliance with regulations, confidentiality of information, and responsible management of results generated with AI support.

New skills for researchers, universities and companies

The adoption of GPT-5 in scientific research is not just about installing a new tool, but about acquiring new skills.Researchers must learn to formulate effective prompts, critically interpret responses, document the model's role in the process, and integrate suggestions into experimental or theoretical protocols without losing traceability. Resources on formulate effective indications and personalizing the interaction are key.

Universities and research institutes are beginning to update their training programs to incorporate modules on AI literacy, ethics, algorithmic bias, data protection, and intellectual property generated with the support of models such as GPT-5. This affects not only STEM fields; it also affects social sciences and humanities, where AI is used to analyze large corpora of text, surveys, or historical data.

Funding agencies and foundations that support scientific projects will also have to set clear rules on the use of GPT-5 in proposals, articles, and reports.These include making it clear whether AI has been used, specifying the model version, detailing how the results have been validated, and recording which part of the work is genuinely human and which has been assisted by the system.

In parallel, GPT-5 has a direct impact on marketing, business, and scientific communicationBiotechnology, medtech, or deep tech companies can use it to analyze customer data, generate specialized content, automate complex responses, and translate research findings into understandable messages for investors, partners, or the general public.

Platforms like SendApp explore precisely this intersection between advanced AI and conversational channelsConnecting GPT-5 with WhatsApp Business via official APIs allows, for example, a laboratory to communicate the latest results to its partners, manage technical inquiries from international clients, or automate part of its scientific dissemination while maintaining a consistent and professional tone.

For teams handling large volumes of interaction, integrating GPT-5 into conversation management systems can improve efficiency.The model suggests responses, classifies requests, summarizes technical documentation, and feeds intelligent chatbots capable of maintaining context, always with the possibility of a human reviewing or taking control when the situation requires it.

Looking at all these uses together, GPT-5 and GPT-5.2 are configured as central pieces of a new way of doing scienceIn this approach, models act as idea generators, facilitators of exhaustive literature searches, support in mathematical proofs, and virtual laboratory assistants. Ultimate responsibility remains with scientists, clinicians, and human teams, but the speed for testing hypotheses, exploring alternative paths, and connecting disparate results is multiplied, ushering in an era where five years of work with well-integrated AI could be equivalent to decades of progress at the traditional pace.

gpt-5-0
Related article:
GPT-5: All about the next big revolution in Artificial Intelligence