Author field/subfield classifier

We assign each OpenAlex author in the 14.6 million-paper BITNET frame a career field and career subfield from the same SciNET taxonomy used for papers. The author label is not a separate model. It is a careful average of the fields assigned to that author's papers.

How well does it work?

We evaluated a stratified sample of authors with Anthropic's Claude Sonnet 4.5. The judge saw the author's name, institutions, journals, and representative paper titles, then chose the author's primary and secondary field/subfield when appropriate.

Author field agreement

Does the career aggregate match the LLM judge?

Top-1 field 80%

Top-2 field 96%

Lenient top-2 99%

Lenient top-2 accepts the LLM's primary, alternate, or secondary field. This matters for interdisciplinary authors.

Author subfield agreement

The same problem at the 304-class grain.

Top-1 subfield 69%

Top-2 subfield 85%

Lenient top-2 94%

Subfields are intentionally not forced to sit under the author's top field, so mixed careers can remain visible.

Accuracy by number of papers

Accuracy improves as the author-level aggregate has more papers to average over. The buckets below count each author's publications in the BITNET/e5 sample and use the same 833 LLM-judged authors as the overall evaluation.

Field top-1 accuracy

Share where the top career field matches the LLM judge.

1 paper n=137 72.3%

2-5 papers n=164 76.2%

6-10 papers n=117 76.1%

11-25 papers n=113 80.5%

26-50 papers n=105 83.8%

51-100 papers n=99 89.9%

100+ papers n=98 90.8%

Subfield top-1 accuracy

The same evaluation at the 304-class subfield grain.

1 paper n=137 59.9%

2-5 papers n=164 61.0%

6-10 papers n=117 76.1%

11-25 papers n=113 66.4%

26-50 papers n=105 77.1%

51-100 papers n=99 78.8%

100+ papers n=98 74.5%

Field top-2 accuracy

Share where either of the top two career fields matches the LLM judge.

1 paper n=137 90.5%

2-5 papers n=164 93.3%

6-10 papers n=117 97.4%

11-25 papers n=113 95.6%

26-50 papers n=105 97.1%

51-100 papers n=99 99.0%

100+ papers n=98 100%

Subfield top-2 accuracy

Share where either of the top two career subfields matches the LLM judge.

1 paper n=137 75.9%

2-5 papers n=164 76.8%

6-10 papers n=117 90.6%

11-25 papers n=113 85.8%

26-50 papers n=105 87.6%

51-100 papers n=99 92.9%

100+ papers n=98 92.9%

What an author field means

An author's primary field is the largest share of their observed publication record in the BITNET frame. It is best read as a career summary: "what field does this author's body of work mostly belong to?" It is not a department, a current job title, or a manually cleaned biography.

Every author also keeps a soft distribution over fields. A physicist with substantial materials-science work, for example, may have Physics as the top field and Materials Science as a close second. The browser shows those secondary probabilities instead of collapsing the author to a single hard label.

How the assignment is made

First, each paper receives up to three field probabilities and up to five subfield probabilities from the paper classifier. For each OpenAlex AuthorID, we add those probabilities across all of the author's papers in the 14.6 million-paper frame, then renormalize the totals so they sum to one. The largest entries become the author's top fields and subfields.

We tested several weighting rules: treating every paper equally, down-weighting uncertain papers, down-weighting very large teams, and combining both adjustments. Equal weighting performed best overall, so it is the version shown on the website.

We use OpenAlex AuthorIDs as they come. If OpenAlex splits one real person into two IDs or merges two people into one ID, that issue is left for a later disambiguation pass.

When to be cautious

Single-paper authors are essentially classified by that one paper. Authors with many papers are more stable because the aggregate averages over more evidence. Mixed careers are also less naturally represented by one top field, so the second field and the confidence values are part of the output rather than decoration.

Open author explorer Read paper overview