We assign each OpenAlex author in the 14.6 million-paper BITNET frame a career field and career subfield from the same SciNET taxonomy used for papers. The author label is not a separate model. It is a careful average of the fields assigned to that author's papers.
How well does it work?
We evaluated a stratified sample of authors with Anthropic's Claude Sonnet 4.5. The judge saw the author's name, institutions, journals, and representative paper titles, then chose the author's primary and secondary field/subfield when appropriate.
Author field agreement
Does the career aggregate match the LLM judge?
Lenient top-2 accepts the LLM's primary, alternate, or secondary field. This matters for interdisciplinary authors.
Author subfield agreement
The same problem at the 304-class grain.
Subfields are intentionally not forced to sit under the author's top field, so mixed careers can remain visible.
Accuracy by number of papers
Accuracy improves as the author-level aggregate has more papers to average over. The buckets below count each author's publications in the BITNET/e5 sample and use the same 833 LLM-judged authors as the overall evaluation.
Field top-1 accuracy
Share where the top career field matches the LLM judge.
Subfield top-1 accuracy
The same evaluation at the 304-class subfield grain.
Field top-2 accuracy
Share where either of the top two career fields matches the LLM judge.
Subfield top-2 accuracy
Share where either of the top two career subfields matches the LLM judge.
What an author field means
An author's primary field is the largest share of their observed publication record in the BITNET frame. It is best read as a career summary: "what field does this author's body of work mostly belong to?" It is not a department, a current job title, or a manually cleaned biography.
Every author also keeps a soft distribution over fields. A physicist with substantial materials-science work, for example, may have Physics as the top field and Materials Science as a close second. The browser shows those secondary probabilities instead of collapsing the author to a single hard label.
How the assignment is made
First, each paper receives up to three field probabilities and up to five subfield probabilities from the paper classifier. For each OpenAlex AuthorID, we add those probabilities across all of the author's papers in the 14.6 million-paper frame, then renormalize the totals so they sum to one. The largest entries become the author's top fields and subfields.
We tested several weighting rules: treating every paper equally, down-weighting uncertain papers, down-weighting very large teams, and combining both adjustments. Equal weighting performed best overall, so it is the version shown on the website.
We use OpenAlex AuthorIDs as they come. If OpenAlex splits one real person into two IDs or merges two people into one ID, that issue is left for a later disambiguation pass.
When to be cautious
Single-paper authors are essentially classified by that one paper. Authors with many papers are more stable because the aggregate averages over more evidence. Mixed careers are also less naturally represented by one top field, so the second field and the confidence values are part of the output rather than decoration.