Samiha Sharlin, Ph.D. Candidate
Advisor:
Tyler Josephson
Title: ADVANCING MOLECULAR SCIENCE WITH MONTE CARLO SIMULATIONS AND LARGE LANGUAGE MODELS
Abstract:
Computational methods and artificial intelligence are reshaping scientific discovery, yet important gaps still limit their impact in practice. The dissertation addresses two of these gaps: 1) molecular simulations rarely capture adsorption processes under conditions relevant to water treatment, and 2) large language models (LLMs) have not been rigorously assessed for scientific reasoning. This dissertation focuses on the development of new approaches to advance molecular simulations and artificial intelligence for science.
Through methodological innovations in molecular simulation, the thesis introduces a computational workflow that models the adsorption of 1,4-dioxane at concentrations as low as 0.35 parts per billion, a level that, to the best of current knowledge, is lower than those previously reported in molecular simulations. We further develop a new Monte Carlo trial for efficiently sampling chemisorption, which can model proton transfer between bulk water and Brønsted acid sites in zeolites. The new scheme introduces a distance bias to promote more interactions between reactant species and is about 70 times faster than the standard approach in a single, fixed reaction site in a slit pore.
Furthermore, the thesis leverages LLMs for discovering equations directly from data while incorporating contextual cues provided in natural language. A purely data-fitted expression can be unphysical upon interpolation, while LLM-generated expressions tend to be more physically meaningful. We also evaluate LLMs on reasoning-heavy tasks in chemistry, like interpreting NMR spectra data to identify molecules. State-of-the-art language models outperformed top-quartile undergraduate students on the NMR-reasoning benchmark. Nevertheless, LLMs still make basic logical errors and are not yet reliable as fully autonomous agents.
The simulations advance and extend molecular modeling to contaminant concentrations relevant to environmental regulations and provide a transferable approach for sampling reactive adsorption in porous materials. The AI benchmarks establish metrics for assessing and improving LLM reasoning in chemistry, highlighting cases where language-based tools can assist scientific discovery. Together, these contributions broaden the computational toolkit for future research in modeling molecular science phenomena using AI-augmented systems.
LOCATION: Information Technology and Engineering Building (ITE) Room 459
Agenda:
12:45 am EST - Welcome / Room Opens
1:00 pm EST - Presentation followed by questions from the audience.
After the questions from the public, the meeting will be closed for the committee discussion.
WEBEX Meeting for virtual attendance:
https://umbc.webex.com/umbc/j.php?MTID=m3707b3b9b096c32f8f2b267354b265ae
Meeting number (access code): 2863 563 1099
Meeting password: xJNzsyJP365
TAP TO JOIN FROM A MOBILE DEVICE (ATTENDEES ONLY)
+1-202-860-2110,,28635631099## tel:%2B1-202-860-2110,,*01*28635631099%23%23*01* United States Toll (Washington D.C.)
JOIN BY PHONE
+1-202-860-2110 United States Toll (Washington D.C.)
Global call-in numbers
https://umbc.webex.com/umbc/globalcallin.php?MTID=m9761ec30e13404dfc76baae4a2d522c7
Can't join the meeting?
https://collaborationhelp.cisco.com/article/WBX000029055
If you are a host, click here and login site to view host information:
https://umbc.webex.com/webappng/sites/umbc/meeting/info/2490f3175e814aef9f8be16f1806ef0f