Funding metadata in OpenAlex

Posted on January 26, 2026January 26, 2026 by Kyle Demes

With the Walden launch behind us, 2026 promises to be an exciting year for OpenAlex. And thanks to a transformative grant from Wellcome of $3.6M over three years, funding metadata will be a major focus of that development.

This Wellcome-funded project aims to make funding information a first-class part of the open scholarly graph so that funders, institutions, researchers, and tool-builders can rely on open, structured, reusable funding metadata.

Below is a progress update on what we’ve shipped so far, what we’re working on now, and how funders can help shape what comes next.

Why funding metadata (and why now)

Funding data is essential infrastructure for research strategy and accountability: funders need to understand what they supported, what it produced, and what changed as a result. They also need global data to position their work within the global funding landscape.

But today, most funding intelligence workflows still depend on closed databases or on burdensome reporting from grantees into siloed funder databases. OpenAlex already provides a comprehensive, open inventory of research outputs. This project extends that foundation so funding metadata becomes similarly open, structured, and connected.

What’s new in OpenAlex

We are hosting a webinar February 19, 2026 at 10am EST to review updates in more detail and allow time for interactive Q&A. You can register for that webinar here and a recording will be available on our YouTube channel afterwards. Here’s a quick update on recent progress.

1) We’re mining full text to match funders to outputs

We’ve begun matching funder names to research outputs through full-text data mining, adding millions of new linkages between funders and their outputs.

We have just started this work and have 10s of millions of PDFs to continue working through, but the momentum is building quickly.

2) “Awards” are now first-class objects in the OpenAlex graph

We’ve updated the OpenAlex schema so awards are first-class citizens, with their own entity type and API endpoint: https://api.openalex.org/awards

This is foundational work: it lets us represent grants/awards as structured nodes in the graph (instead of only as scattered fragments attached to works), which is required for reliable linking, curation, and downstream funding intelligence.

3) When DOIs are registered for grants, they appear in OpenAlex

Any funder registering DOIs for grants can now have their award metadata show up in OpenAlex almost immediately after registration. We’ve built this integration for Crossref award DOIs and will soon have completed the integration for DataCite award DOIs as well.

4) We’re ingesting grant metadata directly from funders

We’ve started ingesting funding metadata directly from funders who make their grant data available online but don’t mint DOIs. At the time of posting this, we had already ingested 11.5M grants.

This is critical: To build a comprehensive database of funding metadata, we need to meet funders where they’re at and ingest their data directly in the formats they’ve made available.

What we’re working on next

Here’s what we’re working on during 2026:

Full-text matching (finish running across our corpus of fulltext; set up on-going pipeline for new PDFs)
Improving matching quality (funder name disambiguation)
Grant ID matching (create linkages between individual grant IDs and papers)
Scaling ingest across many funders and formats (from well-structured national databases to the long tail of smaller or distributed sources)
- We’re starting with a seed list of 50 funders to develop these pipelines. You can check out that list and monitor our progress here.
- We’ll scale funder ingest later this year, but if you want to suggest specific funders you don’t see on our roadmap yet, e-mail kyle@openalex.org
Expanding linkages beyond acknowledgements by incorporating trusted reporting sources wherever possible (e.g., funder impact reports)
Clarifying and prioritizing use cases so we build the funding intelligence workflows funders actually need
Pilot apps that suggest linkages between grants and outputs (e.g., based on vector distance of text in grants and outputs)

Funder workshop in London: April 27–28, 2026

We’re convening an in-person workshop with collaborating funders on April 27–28, 2026 in London, England.

The goals are to:

Review what we’ve learned so far (what’s working, what’s messy, what needs partner input)
Confirm and refine funder use cases for open funding intelligence and impact reporting
Jointly shape the next phase of the project—both technical priorities and outreach activities to scale this initiative globally in the following two years

We will publish a report summarizing the workshop and detailing next phases of the project.

Call to action: we’re looking for funder collaborators (all shapes and sizes)

If you’re a funder—large or small, national or regional, public or private, anywhere in the world—we’d love to talk.

With each funder collaborator, we’re looking to:

Assess the current state of their grant metadata (coverage, structure, identifiers, openness, and constraints)
Help make their award records (and impact reports) easier to discover and reuse when possible
Ingest their grant metadata into OpenAlex to improve linkages between awards and outputs
Fully understand the funding intelligence use cases that matter most to them, so the open dataset supports real reporting and strategy needs

How to get started

The simplest next step is an introductory meeting.

Email the project lead and OpenAlex COO, Kyle Demes: kyle@openalex.org

Thanks (and more soon)

—Kyle

OpenAlex 2026 Roadmap

Posted on January 16, 2026January 16, 2026 by Jason

We just wrapped up our Q1 2026 Town Hall. You can watch the full recording here, but this post covers the highlights: what we shipped last quarter, what’s coming this quarter, and why we think 2026 is a pivotal year for open science.

What we shipped in Q4

The Walden rewrite is done. OpenAlex now runs on a modern Databricks infrastructure that lets us ship faster and iterate on data quality in days instead of months.

We added 192 million new works from DataCite and repositories. OpenAlex now indexes 477 million works—the largest connected repository of scholarship ever published.

On funders and awards: we created Awards as a first-class entity, extracted 27 million funder links from fulltext PDFs, and integrated 15 new funders directly.

What’s coming in Q1

For enterprise users: Credit-based API pricing launches this month. Different calls cost different amounts:

a singleton (/works/w123) is 1 credit,
a list (/works?filter=foo:bar) is 10,
PDF content (coming this month!) is 100,
vector search is 1,000. (coming soon! email steve@ourresearch.org for early access!)

We’re also launching a sync service so you can pull daily updates in one chunk instead of polling millions of records.

For institutions: Affiliation matching curation launches in February. Members can edit the matching algorithm that links affiliation strings to their institution. Changes propagate to the API within a day—permanently improving the dataset for everyone.

We’re also launching two membership tiers at $5k and $20k/year that include ability to curate your own data in OpenAlex, training/consulting, and pro API keys with higher API access for your faculty.

For researchers: A complete rewrite of author name disambiguation ships by end of Q1. This has always been the hardest problem in bibliometrics. With today’s AI, we think we can build the most accurate system ever made.

The bigger picture

There’s a lot more I want to say about why 2026 feels like a pivotal year—why we think the GUI is dead, why open data wins the AI era, and what that means for OpenAlex. I’ll save that for a follow-up post. For now: watch the town hall to hear the full argument, and try the vibe-coded demo I built live during the talk. And join our mailing list to stay up-to-date on all the wild stuff we’re doing this year. It’s going to be, by far, our biggest year ever. You ain’t seen nothing yet.

OpenAlex and NORA Collaborate to connect publications to the OECD FORD Taxonomy

Posted on January 16, 2026January 16, 2026 by Kyle Demes

OpenAlex and NORA (the Danish National Open Research Analytics team) are pleased to announce a collaboration mapping the OpenAlex research classification system to the OECD Fields of Research and Development (FORD) taxonomy. This alignment supports the upcoming launch of the new Danish Research Portal, but also enables OpenAlex users globally to use the taxonomy in their research analytics.

🎯 Why This Matters for Research Analytics

Widely adopted taxonomies like OECD FORD are critical for international benchmarking, reporting, and policy alignment. At the same time, national governments, research institutions, and regional bodies often rely on their own classification schemes that reflect local research priorities and funding strategies.

By linking OpenAlex’s aboutness classification system with the OECD FORD taxonomy, this collaboration creates:

A bridge between global standards and national strategy
An open and transparent alternative to proprietary classification systems
A pathway for countries and institutions to conduct policy-relevant analytics using fully open data
A blueprint for creating crosswalks between OpenAlex and additional research taxonomies

This mapping supports both broader interoperability and regionally specific analysis—without compromising either goal.

🧭 How We Built the Mapping

The mapping was developed using a systematic methodology that relates OpenAlex research subfields with OECD FORD categories. OpenAlex uses metadata about research articles (e.g., title, abstract, journal) to classify research outputs into research topics, subfields, fields, and domains (full documentation here).

OpenAlex subfields were successfully mapped to 38 out of 42 two-digit FORD fields.
The four remaining categories did not have direct equivalents given the current OpenAlex taxonomy structure.
The resulting crosswalk supports comprehensive coverage of major research areas across the OECD framework.

The figure below shows the number of OpenAlex subfields that were mapped to each FORD category. A full table listing each OpenAlex subfield and its corresponding FORD categories is available here.

🤖 Combining Expert Knowledge with AI

To ensure quality and scalability, we employed a dual approach:

A human expert (from OpenAlex) manually assigned OpenAlex subfields to FORD categories.
The same task was conducted using ChatGPT to test whether AI could reliably assist in classification alignment.

Out of 250+ assignments, the two approaches differed in only 11 cases. These were reviewed in collaboration with researchers in those fields: ChatGPT’s classification was determined a better fit in 7 of the 11 cases, while the human’s classification was a better fit only 4 times!

This result gives both teams confidence in using AI to assist with future classification crosswalks—especially as a way to accelerate mappings between OpenAlex and other national or domain-specific taxonomies.

📊 What the Mapping Enables

Once mapped, the classifications were applied by NORA to publications in the Danish Research Portal, which aggregates research outputs from across Denmark’s institutions. The FORD classifications derived from OpenAlex were then compared with classifications from Scopus and Web of Science.

While proprietary licensing prevents sharing of detailed comparisons, results from the three systems were broadly aligned, with some differences reflecting their underlying methodologies. Importantly, this confirms that open infrastructure can meet the same analytical needs traditionally served by closed systems.

🚀 What’s Next

OpenAlex users around the world can apply the crosswalk in their own analyses. If you think it’s useful for us to expose the OECD directly in our public API, let us know! If there is enough interest, we’ll add it this year.
The Danish Research Portal will launch in mid 2026, showcasing Danish research outputs across the OECD FORD classifications.

With the new OpenAlex Walden system, we look forward to expanding support for multiple taxonomies to meet the needs of different countries, research communities, and policy environments.

⚠️ Important Note on Use

This mapping is not formally endorsed by the OECD. We consulted with the OECD team and shared preliminary results to ensure accuracy and transparency. However, users conducting official reporting should validate the mapping according to their institutional or national guidance.

🌍 A Shared Vision for Open, Interoperable Research Infrastructure

This collaboration demonstrates what is possible when national research infrastructure and open data providers work together to align global and local needs. By combining methodological rigor, AI-assisted innovation, and a commitment to openness, NORA and OpenAlex are helping advance a more interoperable and transparent research ecosystem.

If your organization or country uses its own classification system and is interested in implementing it in OpenAlex, we invite you to reach out and collaborate with us.

— The OpenAlex and NORA Teams

OpenAlex: 2025 in Review

Posted on January 5, 2026January 7, 2026 by Kyle Demes

2025 was a defining year for OpenAlex. After two years of learning what the world needs from OpenAlex, we spent last year rebuilding our entire foundation and massively expanding our coverage. During this rebuild, we served exponential growth across academia, government, and industry, solidifying OpenAlex as essential global infrastructure for research.

A New Foundation: Walden Launch

At the end of the year, we launched Walden, the complete rewrite of the OpenAlex system.

On day one, Walden added more than 190 million new works, including records from DataCite and thousands of institutional repositories. For the first time, OpenAlex now creates records even when research exists only in repositories—making millions of previously hard-to-find works truly discoverable. These new records currently live as a dedicated subset (xpac) while we continue strengthening metadata before full integration into the core index.

Walden also gives OpenAlex a modern, flexible architecture making it faster to add new sources, easier to improve quality at scale, and ready for the next generation of features and curation.

Unprecedented Adoption & Global Reach

Use of OpenAlex grew dramatically, ending the year with:

350,000+ monthly unique visitors to our UI
3+ million monthly pageviews on our UI
1.5 billion monthly API calls across OpenAlex (1B) + Unpaywall (0.5B), exceeding Crossref for the first time!
1,100+ Research outputs in 2025 referencing OpenAlex

Rebranding and Clarifying the Mission

As OpenAlex continued to expand, it became clear that OpenAlex is not just one of our products—it is our mission. And in 2025, we reorganized to reflect that realization.

Today:

OpenAlex is the purpose and platform.
Unpaywall is a slice of the OpenAlex database delivered in a specific format.
Unsub is a dashboard built on top of OpenAlex, supporting specific use cases.

This unified identity makes it clearer for our users, clearer for our partners, and clearer for ourselves what we are collectively building together.

Financial Progress & Sustainability

We achieved major sustainability milestones in 2025:

Reached our year 2 $800k ARR target—three months ahead of schedule
Received a $3.5M Wellcome grant to integrate global research funding metadata
Continued strong renewal rates and growing institutional engagement

Running both the old and new systems in parallel, supporting unprecedented usage growth, and delivering Walden led to higher costs than projected. But these were intentional investments to make OpenAlex stronger, more scalable, and more valuable for the long term.

Looking Ahead

With Walden now live, we’re excited to start our next chapter. In 2026, we will:

Launch full community curation pipelines
Integrate global funding metadata
Begin integrating research software as first class research objects
Deepen partnerships with governments, universities, and industry, rolling out new support models and new features.
Continue strengthening sustainability and reliability

Thank You

To everyone who contributed, partnered, advocated, experimented, and trusted OpenAlex this year: thank you! We are thrilled and humbled to watch OpenAlex become the open, global scholarly knowledge graph the world depends on and are deeply aware that none of this happens without you.

Here’s to an even bigger 2026.

— The OpenAlex Team

OpenAlex rewrite (“Walden”) launch!

Posted on November 3, 2025December 13, 2025 by Jason

Today, OpenAlex gets a new engine.

After a year of rebuilding, refactoring, and retesting, the Walden rewrite is now live — powering all of OpenAlex. It’s the same dataset shape you know, but faster, cleaner, and more complete.

You’ll notice better references, better OA detection, better language and license coverage, better everything. We’ve added 190 million new works, including datasets, software, and other research objects from DataCite and thousands of repositories. And thanks to our new foundation, fixes and improvements now roll out in days, not months.

Want to see exactly what changed? Check out OREO — the OpenAlex Rewrite Evaluation Overview — to compare old vs. new data in detail. [edit Dec 13, 2025: OREO is no longer up because the legacy OpenAlex data is no longer being updated…it’s all Walden now, so there’s no comparator].

And if you’d like to dig into the full list of updates, the Walden release notes have you covered.

For the next few weeks, you can still access the old dataset with data-version=1, and starting tomorrow, you can download full snapshots of both the legacy and Walden datasets in the usual way.

The rebuild is done. The road ahead is wide open.

Onward.

A Better Way to Detect Language in OpenAlex—and a Better Way to Collaborate

Posted on October 20, 2025October 20, 2025 by Kyle Demes

As part of the recent Walden system launch, we’ve improved how OpenAlex detects the language of scholarly works. The results are immediately visible in the data: many more works are now correctly recognized as non-English, new languages appear that weren’t represented at all before, and previously unclassified works now have accurate language assignments.

The chart below (source) shows the number of works attributed to each language in the Classic vs. Walden OpenAlex. Most languages fall above the diagonal line, meaning more works in Walden are classified with that language and the cluster of languages on the y-axis are all languages that had no works in Classic OpenAlex but now have works in Walden.

We’re excited about this improvement. But the story behind this improvement is just as important as the technical result—it’s a model for how the research community and open infrastructures like OpenAlex can collaborate to make real, shared progress.

From helpful critique to a true collaboration

Last year, a group of researchers published a preprint evaluating OpenAlex’s language-classification system using a large multilingual gold standard (Céspedes et al., arXiv:2409.10633v2, now published as https://doi.org/10.1002/asi.24979). We were excited to see that an international research collaborative had undertaken such a significant project using OpenAlex with the aim of improving its usefulness for the global research community. Their study was rigorous and thoughtful, and it confirmed something we already knew: our approach to language detection could be improved.

However, the paper stopped short of evaluating and recommending the concrete next steps we could take to improve language detection in OpenAlex. We hadn’t been involved at the beginning of the study to provide the authors with the kinds of metrics or performance comparisons that would actually let us deploy a better model in production. But after publication, we met with some of the authors to discuss what we needed to be able to turn their work into improvements in OpenAlex.

We needed precision and recall metrics for multiple competing candidate algorithms (with a bias towards precision); and
We needed analysis that considered cost and runtime, given that any model we deploy must scale to 400 million+ records.

The researchers enthusiastically took on the additional work— checking in with us throughout the process to make sure they were on the right track. The result was a preprint from their follow-on study, (Sainte-Marie et al., arXiv:2502.03627), that provided exactly the applied, scalable insight we needed.

Turning research into real-world impact

As part of the Walden rewrite, we implemented one of the top-recommended approaches from their study. The improvement has been dramatic:

More works are now correctly classified as non-English languages, instead of being incorrectly labeled as English.
New languages, previously absent from OpenAlex, are now detected for the first time.
Previously “null” records now have reliable language tags.

Before deploying the new model in production, we already knew from the researchers’ analyses and their multilingual gold-standard sets that it would yield a strong overall improvement across the corpus. But we wanted to confirm that in practice. So we manually reviewed a random sample of works whose language classification differed between the old and new systems—and in the vast majority of those cases, the new system was correct.

We also validated against real-world feedback. For instance, the NORA team at Research Portal Denmark had previously submitted support tickets detailing mix-ups between Danish and Norwegian, two languages that are notoriously similar in writing. In ~75% of those cases, the new system now gets it right.

A model for future collaboration

To be clear– we value and learn from every independent evaluation of OpenAlex. One-way critiques from researchers are a vital part of the open-infrastructure ecosystem, and we deeply appreciate the time and expertise the global research community is investing in making OpenAlex better.

What made this case stand out was the second step: turning that critique into a direct collaboration that produced immediately deployable improvements. By working together, we created a fast-tracked feedback loop—from identifying issues in OpenAlex, to developing and testing solutions, to rolling out fixes across hundreds of millions of records. It’s a model we’d love to repeat.

And this is only the beginning. In the next few weeks, we’ll be launching a new community curation system letting researchers and metadata experts around the world submit corrections directly to OpenAlex—creating an even faster, more transparent, and more collaborative way to improve research metadata at scale.

Stay tuned—and thank you to everyone helping make open research information better, one contribution (and one collaboration) at a time.

OpenAlex rewrite enters beta! 🎉

Posted on October 1, 2025October 1, 2025 by Jason

It’s a big week at OpenAlex. On Monday, we announced that OpenAlex is now our top-level brand (and retired the “OurResearch” name). Yesterday we unveiled our new logo. And today, we’re thrilled to launch the beta release of our fully-rewritten codebase (codenamed Walden)!

Walden is faster, bigger, and more maintainable–that means quicker bug fixes, more content, easier feature development, and a smoother experience all around.

Throughout October, we’ll be running Walden and the old system (Classic) side by side, with Classic remaining the default. On November 1 2025, Walden becomes default, and we’ll publish the last data snapshot from the old system (more info on timelines here).

How to test-drive Walden

Walden beta is already live in the API and UI so you can start exploring it right away!

In the UI: click the little 🧪 test-tube icon in the top right (or click here).
In the API: just add data-version=2 to your request, like this: https://api.openalex.org/works?data-version=2.
In OREO: Compare Classic to Walden using the OpenAlex Rewrite Evaluation Overview (OREO, yum). Using OREO you can see exactly what’s changed (good and bad), view known issues, and track our continuous improvements throughout our October beta

Just remember that it’s still in beta: there are lots of known issues and it’s changing every day. If you notice an that’s not already in OREO tests or known issues, report it here.

Key improvements

When you check it out, what should you expect to see? The best way to view a list of improvements is to check out the tests in OREO, especially work tests. But here’s a high-level overview:

150M+ new works: Newly indexed articles, books, datasets, software, dissertations, and more! You can explore just the newly added works here.
Better consistency: Unpaywall and OpenAlex will now always agree.
Better metadata: more citations, more language and retraction coverage, better keywords, more OA data.

Looking Ahead

The last year of rewriting OpenAlex was tough. We couldn’t move as fast as we wanted on new features, and support often lagged. But now we’re equipped to move fast without breaking things. Expect faster improvements, better support, and more ambitious features dropping in Q4, including:

Community curation: fix mistakes (like in Wikipedia) and see them reflected in days.
Vector search endpoint: find relevant works and other entities based on semantic similarity of free-form text
Download endpoint: Access PDF text from DOI or OpenAlex ID
Better funding metadata: New grants entity with better coverage of grant objects and linkages to research outputs and funders

This is a turning point for OpenAlex—and we’re excited to build the future of research infrastructure together with you. The engine’s rebuilt. The road ahead is wide open. Let’s go.

PS want to learn more about Walden? Come to our webinar Oct 7th at 10am Eastern. You can register to attend here.

A New Logo for OpenAlex

Posted on September 30, 2025September 30, 2025 by Jason

This is a big week for OpenAlex: yesterday we reorganized under the OpenAlex brand, and tomorrow we launch our completely rewritten codebase (beta). Today we launch our new logo!

The old logo was unique and conveyed the idea of building, which we loved. But was also visually complex, almost Escheresque; consequently, it didn’t scale down well, and it failed to convey the directness, boldness, and simplicity of our vision: to create a universal, open library of scholarly information.

So as we start a new chapter in OpenAlex, it’s a great time to also launch a new look.

You Bring the Color

We’re doubling down on black and white. That’s not just a design choice, it’s a statement of philosophy. OpenAlex is infrastructure. We’re the pipes under the city, not the flashy towers above it. We want to stay out of your way and let your projects, your creativity, your insights provide the color. You’ll see this commitment carry through in our website, which now leans harder into that clean, monochrome look.

A New Typeface

We’ve switched to Inter. Of course it’s open, just like us. Inter is modern and businesslike, but still human, readable, and approachable. Compared to Dosis (our old font), Inter is sharper, more confident, and more professional—while keeping the sense of openness that’s core to who we are. You’ll see Inter across our site from now on.

The Icon

The new icon is simple: a single, continuously curved outline forms three joined dots. Individually, dots are just dots—but when you connect them, something new emerges:

It’s an A for Alex—but sans crossbar, offering an open doorway in.
It’s a connected network—could be works and citations, coauthorships, or any of the billions of nodes and edges in the OpenAlex graph.
It’s a simplified water molecule.

Why Water?

Water has increasingly become part of our story of what OpenAlex is. Water’s simple but essential. We count on shared infrastructure to deliver it, quietly and reliably, cheaply, but not for free.

At OpenAlex we want to be the pipes under the scholarly city: infrastructure that delivers research information wherever it’s needed, at scale, for cheap. We’re here to support all the amazing things the research ecosystem is doing—quietly, reliably, and everywhere.

A Modern Library of Alexandria

The original Library of Alexandria aimed to collect all scholarly knowledge. OpenAlex means to carry that spirit forward in the digital age: building an open, connected, comprehensive graph of the world’s research.

Our new logo—an open A, a network, a molecule of water—is our way of putting that vision front and center. And if you see a pyramid in this logo, it’s not an accident, it’s homage.

The new logo is a reminder of what we’re building: a simple, open, essential infrastructure for the world’s research information: cheap, reliable, everywhere. That’s OpenAlex.

PS for logo nerds, other inspo: Vercel, the Banner of Peace, and iconic Paul Rand banger Westinghouse.

We’re now OpenAlex

Posted on September 29, 2025September 29, 2025 by Jason

For years, we’ve been working under the name OurResearch. That name sat at the top of our org chart, with three child projects under it: OpenAlex, Unpaywall, and Unsub.

Starting today, things are simpler: that org chart now has just one parent—OpenAlex—with Unpaywall and Unsub beneath it.

Why the change? Three reasons:

1. Fewer brands is clearer

We’re a tiny team, and having so many brands has always been confusing. People wondered: are we OurResearch (or Our Research), or OpenAlex, or Unpaywall, or something else? From now on, the answer is simple: we’re OpenAlex.

2. OpenAlex is what we do

More and more, OpenAlex is the center of our work. It’s our biggest project and the one that takes most of our time. And it’s also the data engine behind our other projects: Unpaywall and Unsub both run on OpenAlex data. In fact, with the launch of our fully rewritten OpenAlex codebase (codenamed Walden) this week, Unpaywall runs as a subroutine of the OpenAlex codebase.

So in a real sense, Unpaywall and Unsub are just friendly wrappers around OpenAlex. Improving OpenAlex improves them automatically.

And the name OpenAlex, with its homage to the ancient Library of Alexandria, captures our long-term vision to gather, organize, and make open all scholarly information.

3. New name, new start

Legally, nothing dramatic is happening—our official name has always been Impactstory, Inc., and “OurResearch” was just a DBA. But this moment is more than just a bookkeeping change.

This is a new chapter for us. The past year has been tough: not much visible progress, a lot of repaying technical debt, and a long slog to rewrite our entire codebase. But that rewrite launches (in beta) this week. And with a fresh codebase comes a fresh start: we get to focus harder, move faster, and pour our energy into making OpenAlex as comprehensive, accurate, and open as possible.

So yes, the name change simplifies things. But more importantly, it marks a new focus and a renewed commitment to our vision: building a universal library of scholarship.

And while we’ll continue to support Unpaywall and Unsub for now, we want to be transparent: OpenAlex is the future. As its functionality grows over the next year or two, Unpaywall and Unsub users will be able to meet their use-cases directly via OpenAlex. The rising tide of OpenAlex lifts all boats.

This week is about OpenAlex

This post is the first of three announcements:

Monday: our name change to OpenAlex (that’s today).
Tuesday: our new logo.
Wednesday: the beta launch our fully rewritten OpenAlex codebase.

When we say we’re focusing on OpenAlex, it’s not just words—we’re shipping, this week. And there’s more coming in Q4:

A new API endpoint to directly download PDFs and parsed PDFs.
A self-serve curation portal (think Wikipedia editing, but for scholarly metadata), where your changes go live in a day or two.
A new vector search API.
Improved funder coverage, thanks to our new Wellcome Trust grant.

After a year of rebuilding, we’ve finally got the tools and the focus we need start delivering more substantively on our vision: a universal, open library of scholarly information. We’re energized. We’re ready. We’re OpenAlex.

Unpaywall improvements: more gold, better green

Posted on August 28, 2025August 28, 2025 by Jason

We recently announced that we’d completely rewritten Unpaywall to make it faster, more accurate, and (most importantly) easier to fix and improve. We wanted to move Unpaywall from product to process, something we could continuously improve along with the community.

Well, we’ve been working hard on that over the last few months and here’s an update!

Better Gold coverage

By far the most common OA color is gold. In fact, based on our manual sampling, 25% of Crossref DOIs are gold OA, which is much higher than I’d expected and much higher than it used to be. (note: in this and all following stats we exclude component DOIs, which aren’t indexed in Unpaywall).

Coverage of gold is very tricky, because it’s all about the status of the work’s source, not the work itself. So we need very comprehensive coverage of sources, which is as hard as it sounds.

Of course there’s DOAJ which is fantastic but they only cover a small subset of gold OA journals. And even for those journals, DOAJ often only tells us that a given journal is fully OA since a certain date—we still need to figure out if the back catalog is open or not.

In recent weeks, we’ve finished several projects to add the “this is gold OA” flag to new journals:

We crawled 50k OJS journals, adding gold status to 17,000 of them (many thanks to Juan Pablo Alperin and Diego Chavarro for their help in getting a list of OJS journals!)
We marked 1,200 new journals gold using data from J-STAGE.
We marked 100 new journals gold using data from SciELO
We added gold status to several dozen journals from fully-OA publishers including including MDPI, Academic Journals, and Edorium.

We also modified our algorithm to assign gold instead of bronze when we know an article is OA, but we can’t figure out its source. Since gold is 2.5x more common than bronze, this will result in fewer errors overall.

Overall, this has made a big change in our gold coverage: now 19% of Unpaywall is gold, compared to 14% in May.

Green OA

We’ve made several changes in our green OA approach. These have not increased our total green percentage, but they have made our assignment of colors more consistent.

The rule for green has always been that if the best OA location is in a repository, it’s green. But, like gold, this is very dependent on us correctly describing the source as a repository. We’re very good at this for institutional repositories—but we’ve not been so good for preprint and data repositories, which are both much more common today then they were when we started Unpaywall.

Other changes

We fixed a bug causing us to list works published under the Elsevier User License as Hybrid. Since we don’t consider that to be an OA license, we moved these to bronze.

We marked SSRN as an open repository…it’s on the bubble but since all works are available free right away, for us it counts.

Results

The “ground truth” dataset is a random sample of 500 DOIs from Crossref. It excludes component DOIs and DOIs that don’t resolve. Each DOI is manually annotated by our team, which often includes doing lots of research on the journals and repositories that host the content. The definitions of oa_status colors come from here, which is in turn based on the original 2018 Unpaywall paper in PeerJ.

As you can see, we’re moving in the correct direction when it comes to gold and hybrid, green isn’t changing, and bronze coverage is going backwards a bit, although it’s still pretty close to the ground truth number. Our roadmap will prioritize green and gold for the next few months at least.

The future

The most important change for Unpaywall moving forward is the upcoming rewrite of OpenAlex, which will be gradually rolled out October-November of this year. That’s because when this rewrite is deployed, OpenAlex and Unpaywall will finally share the exact same codebase. Of course this will eliminate those pesky, embarrassing bugs where Unpaywall and OpenAlex disagree. But more importantly, it’ll link the large Unpaywall and OpenAlex communities, allowing everyone to improve both products together.

Even before that, though, we’ll be unveiling another exciting change: a new and improved curation portal. This will make it easier to fix article-level bugs in Unpaywall, including bugs that current curation solution doesn’t address (like missing PDF URLs and incorrect licenses). Even cooler, though it’ll allow users to fix source-level bugs, particularly fixing journals that should be marked gold, but aren’t. Although someday AI might let us automate this, for now, we think that active community curation is the only viable way to keep that data accurate and up to date. The unification of OpenAlex and Unpaywall codebases means that all these changes will propagate to both systems within days.

Ok, that’s all for now! Thanks for your support and as always, please get in touch with any suggestions or feedback!