We’re now OpenAlex

For years, we’ve been working under the name OurResearch. That name sat at the top of our org chart, with three child projects under it: OpenAlex, Unpaywall, and Unsub.

Starting today, things are simpler: that org chart now has just one parent—OpenAlex—with Unpaywall and Unsub beneath it.

Why the change? Three reasons:

1. Fewer brands is clearer

We’re a tiny team, and having so many brands has always been confusing. People wondered: are we OurResearch (or Our Research), or OpenAlex, or Unpaywall, or something else? From now on, the answer is simple: we’re OpenAlex.

2. OpenAlex is what we do

More and more, OpenAlex is the center of our work. It’s our biggest project and the one that takes most of our time. And it’s also the data engine behind our other projects: Unpaywall and Unsub both run on OpenAlex data. In fact, with the launch of our fully rewritten OpenAlex codebase (codenamed Walden) this week, Unpaywall runs as a subroutine of the OpenAlex codebase.

So in a real sense, Unpaywall and Unsub are just friendly wrappers around OpenAlex. Improving OpenAlex improves them automatically.

And the name OpenAlex, with its homage to the ancient Library of Alexandria, captures our long-term vision to gather, organize, and make open all scholarly information.

3. New name, new start

Legally, nothing dramatic is happening—our official name has always been Impactstory, Inc., and “OurResearch” was just a DBA. But this moment is more than just a bookkeeping change.

This is a new chapter for us. The past year has been tough: not much visible progress, a lot of repaying technical debt, and a long slog to rewrite our entire codebase. But that rewrite launches (in beta) this week. And with a fresh codebase comes a fresh start: we get to focus harder, move faster, and pour our energy into making OpenAlex as comprehensive, accurate, and open as possible.

So yes, the name change simplifies things. But more importantly, it marks a new focus and a renewed commitment to our vision: building a universal library of scholarship.

And while we’ll continue to support Unpaywall and Unsub for now, we want to be transparent: OpenAlex is the future. As its functionality grows over the next year or two, Unpaywall and Unsub users will be able to meet their use-cases directly via OpenAlex. The rising tide of OpenAlex lifts all boats.


This week is about OpenAlex

This post is the first of three announcements:

  • Monday: our name change to OpenAlex (that’s today).
  • Tuesday: our new logo.
  • Wednesday: the beta launch our fully rewritten OpenAlex codebase.

When we say we’re focusing on OpenAlex, it’s not just words—we’re shipping, this week. And there’s more coming in Q4:

  • A new API endpoint to directly download PDFs and parsed PDFs.
  • A self-serve curation portal (think Wikipedia editing, but for scholarly metadata), where your changes go live in a day or two.
  • A new vector search API.
  • Improved funder coverage, thanks to our new Wellcome Trust grant.

After a year of rebuilding, we’ve finally got the tools and the focus we need start delivering more substantively on our vision: a universal, open library of scholarly information. We’re energized. We’re ready. We’re OpenAlex.

Unpaywall improvements: more gold, better green

We recently announced that we’d completely rewritten Unpaywall to make it faster, more accurate, and (most importantly) easier to fix and improve. We wanted to move Unpaywall from product to process, something we could continuously improve along with the community.

Well, we’ve been working hard on that over the last few months and here’s an update!

Better Gold coverage

By far the most common OA color is gold. In fact, based on our manual sampling, 25% of Crossref DOIs are gold OA, which is much higher than I’d expected and much higher than it used to be. (note: in this and all following stats we exclude component DOIs, which aren’t indexed in Unpaywall).

Coverage of gold is very tricky, because it’s all about the status of the work’s source, not the work itself. So we need very comprehensive coverage of sources, which is as hard as it sounds.

Of course there’s DOAJ which is fantastic but they only cover a small subset of gold OA journals. And even for those journals, DOAJ often only tells us that a given journal is fully OA since a certain date—we still need to figure out if the back catalog is open or not.

In recent weeks, we’ve finished several projects to add the “this is gold OA” flag to new journals:

  • We crawled 50k OJS journals, adding gold status to 17,000 of them (many thanks to Juan Pablo Alperin and Diego Chavarro for their help in getting a list of OJS journals!)
  • We marked 1,200 new journals gold using data from J-STAGE.
  • We marked 100 new journals gold using data from SciELO
  • We added gold status to several dozen journals from fully-OA publishers including including MDPI, Academic Journals, and Edorium.

We also modified our algorithm to assign gold instead of bronze when we know an article is OA, but we can’t figure out its source. Since gold is 2.5x more common than bronze, this will result in fewer errors overall.

Overall, this has made a big change in our gold coverage: now 19% of Unpaywall is gold, compared to 14% in May.

Green OA

We’ve made several changes in our green OA approach. These have not increased our total green percentage, but they have made our assignment of colors more consistent.

The rule for green has always been that if the best OA location is in a repository, it’s green. But, like gold, this is very dependent on us correctly describing the source as a repository. We’re very good at this for institutional repositories—but we’ve not been so good for preprint and data repositories, which are both much more common today then they were when we started Unpaywall.

Other changes

We fixed a bug causing us to list works published under the Elsevier User License as Hybrid. Since we don’t consider that to be an OA license, we moved these to bronze.

We marked SSRN as an open repository…it’s on the bubble but since all works are available free right away, for us it counts.

Results

The “ground truth” dataset is a random sample of 500 DOIs from Crossref. It excludes component DOIs and DOIs that don’t resolve. Each DOI is manually annotated by our team, which often includes doing lots of research on the journals and repositories that host the content. The definitions of oa_status colors come from here, which is in turn based on the original 2018 Unpaywall paper in PeerJ.

As you can see, we’re moving in the correct direction when it comes to gold and hybrid, green isn’t changing, and bronze coverage is going backwards a bit, although it’s still pretty close to the ground truth number. Our roadmap will prioritize green and gold for the next few months at least.

The future

The most important change for Unpaywall moving forward is the upcoming rewrite of OpenAlex, which will be gradually rolled out October-November of this year. That’s because when this rewrite is deployed, OpenAlex and Unpaywall will finally share the exact same codebase. Of course this will eliminate those pesky, embarrassing bugs where Unpaywall and OpenAlex disagree. But more importantly, it’ll link the large Unpaywall and OpenAlex communities, allowing everyone to improve both products together.

Even before that, though, we’ll be unveiling another exciting change: a new and improved curation portal. This will make it easier to fix article-level bugs in Unpaywall, including bugs that current curation solution doesn’t address (like missing PDF URLs and incorrect licenses). Even cooler, though it’ll allow users to fix source-level bugs, particularly fixing journals that should be marked gold, but aren’t. Although someday AI might let us automate this, for now, we think that active community curation is the only viable way to keep that data accurate and up to date. The unification of OpenAlex and Unpaywall codebases means that all these changes will propagate to both systems within days.

Ok, that’s all for now! Thanks for your support and as always, please get in touch with any suggestions or feedback!

Podcast episode about Unpaywall


 

I recently had a fun conversation with @ORION_opensci for their just-launched podcast.

The episode is about half an hour long, and covers what @Unpaywall is, who uses it, how it came about, a bit about how it works, thoughts on the importance of #openinfrastructure, the sustainability model, how open jives with getting money from Elsevier, #PlanS, how to help the #openscience revolution…

Anyway, here’s where you can listen (you can either load it into your Podcast app, or just press “play” on the webpage player):

https://orionopenscience.podbean.com/e/scaling-the-paywall-how-unpaywall-improved-open-access/

(Or here’s the MP3.)

Thanks for having me @OOSP_ORIONPod, it was super fun!  And do check out the rest of the episodes as well, they are covering great topics:

 

Unpaywall extension adds 200,000th active user

We’re thrilled to announce that we’re now supporting over 200,000 active users of the Unpaywall extension for Chrome and Firefox!

The extension, which debuted nearly two years ago, helps users find legal, open access copies of paywalled scholarly articles. Since its release, the extension has been used more than 45 million times, finding an open access copy in about half of those. We’ve also been featured in The Chronicle of Higher Ed, TechCrunch, Lifehacker, Boing Boing, and Nature (twice).

However, although the extension gets the press, the database powering the extension is the real star. There are millions of people using the Unpaywall database every day:

  • We deliver nearly one million OA papers every day to users worldwide via our open API…that’s 10 papers every second!
  • Over 1,600 academic libraries use our SFX integration to automatically find and deliver OA copies of articles when they have no subscription access.
  • If you’re using an academic discovery tool, it probably includes Unpaywall data…we’re integrated into Web of Science, Europe PubMed Central, WorldCat, Scopus, Dimensions, and many others.
  • Our data is used to inform and monitor OA policy at organizations like the US NIH, UK Research and Innovation, the Swiss National Science Foundation, the Wellcome Trust, the European Open Science Monitor, and many others.

The Unpaywall database gets information from over 50,000 academic journals and 5000 scholarly repositories and archives, tracking OA status for more than 100 million articles. You can access this data for free using our open API, or user our free web-based query tool. Or if you prefer, you can just download the whole database for free.

Unpaywall is supported via subscriptions to the Unpaywall Data Feed, a high-throughput pipeline providing weekly updates to our free database dump. Thanks to Data Feed subscribers, Unpaywall is completely self-sustaining and uses no grant funding. That makes us real optimistic about our ability to stick around and provide open infrastructure for lots of other cool projects.

Thanks to everyone who has supported this project, and even more, thanks to everyone who has fought for open access. Without y’all, Unpaywall wouldn’t matter. With you: we’re changing the world. Together. Next stop 300k!

We’re building a search engine for academic literature–for everyone


Huzzah! Today we’re announcing an $850k grant from the Arcadia Fund to build a new way for folks to find, read, and understand the scholarly literature.

Wait, another search engine? Really?

Yep. But this one’s a little different: there are already a lot of ways for academic researchers to find academic literature…we’re building one for everyone else.

We’re aiming to meet the information needs of citizen scientists, patients, K-12 teachers, medical practitioners, social workers, community college students, policy makers, and millions more. What they all have in common: they’re folks who’d benefit from access to the scholarly record, but they’ve historically been locked out. They’ve had no access to the content or the context of the scholarly conversation.

Problem: it’s hard to access to content

Traditionaly, the scholarly literature was paywalled, cutting off access to the content. The Open Access movement is on the way to solving this: Half of new articles are now free to read somewhere, and that number is growing. The catch is that there are more than 50,000 different “somewheres” on web servers around the world, so we need a central index to find it. No one’s done a good job of this yet (Google Scholar gets close, but it’s aimed at specialists, not regular people. It’s also 100% proprietary, closed-source, closed-data, and subject to disappearing at Google’s whim.)

Problem: it’s hard to access to context

Context is the stuff that makes an article understandable for a specialist, but gobbledegook to the rest of us. So that includes everything from field-specific jargon, to strategies for on how to skim to the key findings, to knowledge of core concepts like p-values. Specialists have access to context. Regular folks don’t. This makes reading the scholarly literature like reading Shakespeare without notes: you get glimmers of beauty, but without some help it’s mostly just frustrating.

Solution: easy access to the content and context of research literature.

Our plan: provide access to both content and context, for free, in one place. To do that, we’re going to bring together an open a database of OA papers with a suite AI-powered support tools we’re calling an Explanation Engine.

We’ve already finished the database of OA papers. So that’s good. With the free Unpaywall database, we’ve now got 20 million OA articles from 50k sources, built on open source, available as open data, and with a working nonprofit sustainability model.

We’re building the “AI-powered support tools” now. What kind of tools? Well, let’s go back to the Hamlet example…today, publishers solve the context problem for readers of Shakespeare by adding notes to the text that define and explain difficult words and phrases. We’re gonna do the same thing for 20 million scholarly articles. And that’s just the start…we’re also working on concept maps, automated plain-language translations (think automatic Simple Wikipedia), structured abstracts, topic guides, and more. Thanks to recent progress in AI, all this can be automated, so we can do it at scale. That’s new. And it’s big.

The payoff

When Microsoft launched Altair BASIC for the new “personal computers,” there were already plenty of programming environments for experts. But here was one accessible to everyone else. That was new. And ultimately it launched the PC revolution, bringing computing the lives of regular folks. We think it’s time that same kind of movement happened in the world of knowledge.

From a business perspective, you might call this a blue ocean strategy. From a social perspective (ours), this is a chance to finally cash the cheques written by the Open Access movement. It’s a chance to truly open up access to the frontiers of human knowledge to all humans.

If that sounds like your jam, we’d love your support: tell your friends, sign up for early access, and follow us for updates. It’s gonna be quite an adventure.

Here’s the press release.

When will everything be Open Access?

Posted on


OA continues to grow. But when will it be…done? When will everything be published as Open Access?

Using data from our recently-published PeerJ OA study, we took a crack at answering that question. This data we’re using comes from the Unpaywall database–now the largest open database of OA articles ever created, with comprehensive data on over 90 million articles. Check out the paper for more lots more details on how we assembled the data, along with assessments of accuracy and other goodies. But without further ado, here’s our projection of OA growth:

growth-over-time

In the study, we found that OA is increasingly likely for newer articles since around 1990. That’s the solid line part of the graph, and is based on hard data.

But since the curve is so regular, it was tempting to extend it so see what would happen at the current rate of increase. That’s the dotted line in the figure above. Of course it’s a pretty facile projection, in that no effort has been made to model the underlying processes. #limitations #futurework 😀. Moreover, the 2040 number is clearly too conservative since it doesn’t account for discontinuities–like the surge in OA we’ll see in 2020 when new European mandates take effect.

But while the dates can’t be known for certain, what the data makes very clear is that we are headed for an era of universal OA. It’s not a question of if, but when. And that’s great news.

Open Access, coming to a workflow near you: welcome to the year of Ubiquitous OA

Posted on


Thanks to 20 years of OA innovation and advocacy, today you can legally access around half the recent research literature for free. However, in practice, much of this free literature is not as open as we’d like it to be, because it’s hard for readers to find the OA version.

OA lives on repositories and publisher websites. But very few people visit these sources directly to find a given article. Instead, people rely on the search tools that are already part of their existing workflows. Historically, these haven’t done a great job surfacing OA resources. Google, for instance, often fails to index OA versions, in addition to indexing content of dubious provenance. OA aggregators like BASE, CORE, and OpenAIRE aim to solve this by emphasizing OA coverage, but they require researchers to add a second or third search step to their existing workflows–something researchers have been reluctant to do.

So in addition to the well-known access problem, we also have a discovery problem. On the one there’s a healthy, efficient OA infrastructure in journals and repositories. On the other we have millions of individual readers doing their own thing. We need to connect these. We need to cover this last mile between the infrastructure and the individual user, and we need to make that connection easy and seamless and ubiquitous. Until we do, OA is writing a check it can’t fully cash.

But the news is good: over the last year, several efforts are emerging to cover that last mile. Our contribution was Unpaywall: an extension that shows a green tab in your browser on articles where there’s an OA version available.  Unpaywall has enjoyed lots of success, adding over 100,000 active users in under than a year. Moreover, the backend database of Unpaywall (formerly called oaDOI) can be integrated into any number of existing tools, making it easier to spread OA content all over the place. For instance, we’re already seeing over a million uses every day from library link resolvers.

Our most recent integration takes this to a new level, and we’re so excited about it: thanks to a new partnership between Impactstory and Clarivate Analytics, data from Impactstory’s Unpaywall database is now live in the Web of Science, making it the first editorially-curated and publisher-neutral resource to implement this technology. Web of Science has been able to use Unpaywall data to discover and link to millions more OA records amongst their existing content.  This enables millions of Web of Science users around the world to link straight from their search results to a trusted, legal, peer-reviewed OA version—and they can also filter search results by the different versions of OA.

This is a big deal because article and indexing (A&I) systems like Web of Science are currently the most important way researchers access literature.  And though it’s by no means the only A&I system out there, Web of Science is the most respected and most prevalent. Every month, millions of users access literature through Web of Science—and now, each and every one of them will see more OA options for articles they might not otherwise discover, right alongside subscribed content.  Every day. What a huge change from the days we had to convince folks that OA was legitimate at all! It’s a new era.

A new era: that’s not just a hyperbolic phrase. We think this year marks the turning of a new moment in the OA narrative. We’re moving out of the author-focused, advocacy-focused initial phase, and into a more mature era of ubiquitous Open Access, characterized by deep integration of OA into researcher workflows and value-add services built on top of the immense OA corpus. This is the era of user-focused OA.

As OA becomes the default state for published research, tools that centralize, mine, index, search, organize, and extract knowledge from papers suddenly become massively more powerful.  Integrations between Unpaywall and commercial services aren’t generating this new era, but they are one of the hallmarks of it. We’re not making new OA, but rather starting to leverage the massive OA corpus now available. In the last year, many others have begun to do this as well. Many, many more will follow

For years, we in the OA advocate community have been arguing that a critical mass of OA would not just improve scholarly communication, it would transform it. This is finally beginning to happen, and we think this partnership with Web of Science is an early part of that transformation. Now, a subscription to Web of Science—something most academic libraries globally already have—is also a subscription to a database of millions of free-to-read OA articles.

We’ve never been more excited about the future of OA–or more thankful for all the work the OA community as a whole has done to get here. And we can’t wait to keep working together with the community to help make the vision of ubiquitous open access a reality.

Green Open Access comes of age

Posted on


This morning David Prosser, executive director of Research Libraries UK, tweeted, “So we have @unpaywall, @oaDOI_org, PubMed icons – is the green #OA infrastructure reaching maturity?(link).

We love this observation, and not just because two of the three projects he mentioned are from us at Impactstory 😀. We love it because we agree: Green OA infrastructure is at a tipping point where two decades of investment, a slew of new tools, and a flurry of new government mandates is about to make Green OA the scholarly publishing game-changer.

A lot of folks have suggested that Sci-Hub is scholarly publishing’s “Napster moment,” where the internet finally disrupts a very resilient, profitable niche market. That’s probably true. But like music industry shut down Napster, Elsevier will likely be able shut down Sci-Hub. They’ve got both the money and the legal (though not moral) high ground and that’s a tough combo to beat.

But the future is what comes after Napster. It’s in the iTunes and the Spotifys of scholarly communication. We’ve built something to help to create this future. It’s Unpaywall, a browser extension that instantly finds free, legal Green OA copies of paywalled research papers as you browse–like a master key to the research literature. If you haven’t tried it yet, install Unpaywall for free and give it a try.

Unpaywall has reached 5,000 active users in our first ten days of pre-release.

But Unpaywall is far from the only indication that we’re reaching a Green OA inflection point. Today is a great day to appreciate this, as there’s amazing Green OA news everywhere you look:

  • Unpaywall reached the 5000 Active Users milestone. We’re now delivering tens of thousands of OA articles to users in over 100 countries, and growing fast.
  • PubMed announced Institutional Repository LinkOut, which links every PubMed article to a free Green copy in institutional repositories where available. This is huge, since PubMed is one of the world’s most important portals to the research literature.
  • The Open Access Button announced a new integration with interlibrary loan that will make it even more useful for researchers looking for open content. Along with the interlibrary loan request, they send instructions to authors to help them self-archive closed publications.

Over the next few years, we’re going to see an explosion in the amount of research available openly, as government mandates in the US, UK, Europe, and beyond take force. As that happens, the raw material is there to build completely new ways of searching, sharing, and accessing the research literature.
We think Unpaywall is a really powerful example: When there’s a big Get It Free button next to the Pay Money button on publisher pages, it starts to look like the game is changing. And it is changing. Unpaywall is just the beginning of the amazing open-access future we’re going to see. We can’t wait!