Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from Literature with GPT-3

Kavli Affiliate: Paul Alivisatos

| First 5 Authors: Nicholas Walker, John Dagdelen, Kevin Cruse, Sanghoon Lee, Samuel Gleason

| Summary:

Although gold nanorods have been the subject of much research, the pathways
for controlling their shape and thereby their optical properties remain largely
heuristically understood. Although it is apparent that the simultaneous
presence of and interaction between various reagents during synthesis control
these properties, computational and experimental approaches for exploring the
synthesis space can be either intractable or too time-consuming in practice.
This motivates an alternative approach leveraging the wealth of synthesis
information already embedded in the body of scientific literature by developing
tools to extract relevant structured data in an automated, high-throughput
manner. To that end, we present an approach using the powerful GPT-3 language
model to extract structured multi-step seed-mediated growth procedures and
outcomes for gold nanorods from unstructured scientific text. GPT-3 prompt
completions are fine-tuned to predict synthesis templates in the form of JSON
documents from unstructured text input with an overall accuracy of $86%$. The
performance is notable, considering the model is performing simultaneous entity
recognition and relation extraction. We present a dataset of 11,644 entities
extracted from 1,137 papers, resulting in 268 papers with at least one complete
seed-mediated gold nanorod growth procedure and outcome for a total of 332
complete procedures.

| Search Query: ArXiv Query: search_query=au:”Paul Alivisatos”&id_list=&start=0&max_results=3

Read More