What should a FAIR checker include?

Posted on

The Wellcome Trust is considering funding a tool that would report on the FAIR status of research outputs.  We recently responded to their Request for Information with some ideas to refine their initial plan and thought we’d share them here!

a) Include Openness Assessment

[Figure source]

We believe the planned software tool should not only assess the FAIRness of research outputs, but also their Openness.  As described in the recent Final Report and Action Plan from the European Commission Expert Group on FAIR Data:  “Data can be FAIR or Open, both or neither. The greatest benefits come when data are both FAIR and Open, as the lack of restrictions supports the widest possible reuse, and reuse at scale.”    

This refinement is essential for several reasons.  First, we believe researchers will be expect something called a “FAIR assessment” to include assessing Openness, and will be confused when it does not, leading to poor understanding of the system.  Second, the benefit of openness is clear to everyone and increases the motivation of the project to researchers. Third, Wellcome has done a great job of highlighting the need for openness already and so it helps the tool be an incremental addition to the work they have done rather than a different, new set of requirements with an unclear relationship.  Fourth, an openness assessment tool is needed by the community, and would fit very well in the proposed tool, and its anticipated popularity and exposure would help the FAIR assessment gain traction.

 

b) Require the tool produce Open Data, not just be Open Source

The project brief was very clear that the tool needs to be Open Source, with a liberal license.  This is great. We suggest the brief needs to add that the data provided by the tool will be Open Data.  Ideally the brief would suggest a license for the data (CC0, or an open database license which facilitates reuse including commercial reuse) and data delivery specifications.  For data delivery we suggest both regular full data dumps and also a machine-readable free open JSON API which requires minimal registration, is high performing (< 1 second response time), can handle a high concurrent load, has high daily quota limits, and can handle at least a million calls per day across the system.

It could also specify that money could be charged for Support-Level Agreements for the API for institutions who want that, or for above-normal quotas on the API, for more common data dumps, or similar.  This is similar to our Unpaywall open data model which has worked very well.

 

c) Pre-ingest hundreds of millions of research objects

The project brief should make it more explicit that the software tool needs to launch with pre-calculation of scores/badges of a hundreds of millions of research objects.   We luckily live in a world where many research objects are already listed in repositories like Crossref, DataCite, Github, etc. These should be ingested and form the basis of the dataset used by the tool.  This pre-ingesting is implicitly needed to do some of the leaderboards and aggregations specified by the brief: in our opinion it should be more explicit. It will also allow large-scale calibration of scores, large-scale datasets to be exported to support policy research, additional tools, etc, and would assure a high-performing system which can not be assured when FAIR assessments are made ad-hoc upon request for most products.

(Admittedly gathering research objects registered in such sources naturally selects research objects that have identifiers, and a certain standard and kind of metadata and FAIR level, so it isn’t representative of all research objects — this needs to be considered when using it for calibration)

 

d) More details on aggregation

The brief doesn’t include enough details on aggregation.  In our opinion aggregation is key.

Aggregation supports context for FAIR metrics and badges (through percentiles etc), facilitates publicity, inspires change and improvement, etc.  Most research objects do not have metadata that supports interesting aggregation right now — datasets are rarely associated with an ORCID or institution, etc.  RFPs should specify how they will facilitate aggregation. We anticipate the proposals will include combination of automated approaches using metadata (use crossref and datacite metadata, and pubmed linkout data, to associate datasets with papers, which are themselves associated with ORCIDs and clinical trial IDs and GRID institutional identifiers) and text mining (to associate github links with papers) etc, and methods for CSV uploads to link identifiers to aggregation groups

 

e) Include Actionable Steps for immediate FAIR score improvement

The brief should specify that after showing them their scores, the tool links researchers to actionable steps that they should take to improve their FAIR and Open Data scores.  These could simply be How-to guides — how to put your software on Github, how to specify a license for your dataset, how to make your paper Open Access via uploading the accepted manuscript etc. They should walk the researcher through how to improve their score on existing products, and then immediately recalculate the FAIR score so the researcher can see progress.  If this sort of recalculation ability is not built in to the design from the beginning it can be lead to system designs which make it difficult to add later.

 

f) Open grants process for this RFI

The RFP should give applicants the option to make their proposals public (and encourage them to do so), and the grant reviews should be public.  Or at least make steps forward on this, in the spirit of incremental improvement on the Wellcome’s great Open Research Fund mechanisms.

 

Unpaywall extension adds 200,000th active user

We’re thrilled to announce that we’re now supporting over 200,000 active users of the Unpaywall extension for Chrome and Firefox!

The extension, which debuted nearly two years ago, helps users find legal, open access copies of paywalled scholarly articles. Since its release, the extension has been used more than 45 million times, finding an open access copy in about half of those. We’ve also been featured in The Chronicle of Higher Ed, TechCrunch, Lifehacker, Boing Boing, and Nature (twice).

However, although the extension gets the press, the database powering the extension is the real star. There are millions of people using the Unpaywall database every day:

  • We deliver nearly one million OA papers every day to users worldwide via our open API…that’s 10 papers every second!
  • Over 1,600 academic libraries use our SFX integration to automatically find and deliver OA copies of articles when they have no subscription access.
  • If you’re using an academic discovery tool, it probably includes Unpaywall data…we’re integrated into Web of Science, Europe PubMed Central, WorldCat, Scopus, Dimensions, and many others.
  • Our data is used to inform and monitor OA policy at organizations like the US NIH, UK Research and Innovation, the Swiss National Science Foundation, the Wellcome Trust, the European Open Science Monitor, and many others.

The Unpaywall database gets information from over 50,000 academic journals and 5000 scholarly repositories and archives, tracking OA status for more than 100 million articles. You can access this data for free using our open API, or user our free web-based query tool. Or if you prefer, you can just download the whole database for free.

Unpaywall is supported via subscriptions to the Unpaywall Data Feed, a high-throughput pipeline providing weekly updates to our free database dump. Thanks to Data Feed subscribers, Unpaywall is completely self-sustaining and uses no grant funding. That makes us real optimistic about our ability to stick around and provide open infrastructure for lots of other cool projects.

Thanks to everyone who has supported this project, and even more, thanks to everyone who has fought for open access. Without y’all, Unpaywall wouldn’t matter. With you: we’re changing the world. Together. Next stop 300k!

It’s time to insist on #openinfrastructure for #openscience


It’s time.  In the last month there’ve been three events that suggest now is the time to start insisting on open infrastructure for open science:

The first event was the publication of two separate recommendations/plans on open science, a report by the National Academies in the US, and Plan S by the EU on open access.  Notably, although comprehensive and bold in many other regards, neither report/plan called for open infrastructure to underpin the proposed open science initiatives.

Peter Suber put it well in his comments on Plan S:

the plan promises support for OA infrastructure, which is good. But it never commits to open infrastructure, that is, platforms running on open-source software, under open standards, with open APIs for interoperability, preferably owned or hosted by non-profit organizations. This omission invites the fate that befell bepress and SSRN, but this time for all European research.

The second event was the launch of Google’s Dataset Search — without an API.

Why do we care?  Because of opportunity cost.  Google Scholar doesn’t have an API, and Google has said it never will.  That means that no one has been able to integrate Google Scholar results into their workflows or products.  This has had a huge opportunity cost for scholarship.  It’s hard to measure, of course, opportunity costs always are, but we can get a sense of it: within 2 years of the Unpaywall launch (a product which does a subset of the same task but with an open api and open bulk data dump), the Unpaywall data has been built in to 2000 library workflows, the three primary A&I indexes, competing commercial OA discovery services, many reports, apps of countless startups, and more integrations in the works.  All of that value-add was waiting for a solution that others could build on.

If we relax and consider the Dataset Search problem solved now that Google has it working, we’re forgoing these same integration possibilities for dataset search that we lost out on for so long with OA discovery.  We need to build open infrastructure: the open APIs and open source solutions that Peter Suber talks about above.

As Peter Kraker put it on Twitter the other day: #dontLeaveItToGoogle.

The third event was of a different sort: a gathering of 58 nonprofit projects working toward Open Science.  It was the first time we’ve gathered together explicitly like that, and the air of change was palatable.

It’s exciting.  We’re doing this.  We’re passionate about providing tools for the open science workflow that embody open infrastructure.

If you are a nonprofit but you weren’t at JROST last month, join in!  It’s just getting going.

 

So.  #openinfrastructure for #openscience.  Everybody in scholarly communication: start talking about it, requesting it, dreaming it, planning it, building it, requiring it, funding it.  It’s not too big a step.  We can do it.  It’s time.

 

ps More great reading on what open infrastructure means from Bilder, Lin, and Neylon (2015) here and from Hindawi here.

pps #openinfrastructure is too long and hard to spell for a rallying cry.  #openinfra??  help 🙂

Reposted from Heather’s personal Research Remix blog.

Impactstory is hiring a full-time developer


We’re looking for a great software developer!  Help us spread the word!  Thanks 🙂

 

ABOUT US

We’re building tools to bring about an open science revolution.  

Impactstory began life as a hackathon project. As the hackathon ended, a few of us migrated into the hotel hallway to continue working, completing the prototype as the hotel started waking up for breakfast. Months of spare-time development followed, then funding. That was five years ago — we’ve got the same excitement for Impactstory today.

We’ve also got great momentum.  The scientific journal Nature recently profiled our main product:  “Unpaywall has become indispensable to many academics, and tie-ins with established scientific search engines could broaden its reach.”  We’re making solid revenue, and it’s time to expand our team.

We’re passionate about open science, and we run our non-profit company openly too.  All of our code is open source, we make our data as open as possible, and we post our grant proposals so that everyone can see both our successful and our unsuccessful ones.  We try to be the change we want to see 🙂

ABOUT THE POSITION

The position is lead dev for Unpaywall, our index of all the free-to-read scholarly papers in the world. Because Unpaywall is surfacing millions of formerly inaccessible open-access scientific papers, it’s growing very quickly, both in terms of usage and revenue. We think it’s a really transformative piece of infrastructure that will enable entire new classes of tools to improve science communication. As a nonprofit, that’s our aim.

We’re looking for someone to take the lead on the tech parts of Unpaywall.  You should know Python and SQL (we use PostgreSQL) and have 5+ years of experience programming, including managing a production software system.  But more importantly, we’re looking for someone who is smart, dedicated, and gets things done! As an early team member you will play a key role in the company as we grow.

The position is remote, with flexible working hours, and plenty of vacation time.  We are a small team so tell us what benefits are important to you and we’ll make them happen.

OUR TEAM

We’re at about a million dollars of revenue (grants and earned income) with just two employees: the two co-founders.  We value kindness, honesty, grit, and smarts. We’re taking our time on this hire, holding out for just the right person.

HOW TO APPLY

Sound like you? Email to team@impactstory.org with (1) what appeals to you about this specific job (this part is important to us), (2) a brief summary of your experience with directly maintaining and enhancing a production system (3) a copy of your resume or linkedin profile and (4) a link to your github profile. Thanks!

 

Edited Sept 25, 2018 to add minimum experience and more details on how to apply.

Elsevier becomes newest customer of Unpaywall Data Feed

Posted on


We’re pleased to announce that Elsevier has become the newest customer of Impactstory’s Unpaywall Data Feed, which provides a weekly feed of changes in Unpaywall, our open database of 20 million open access articles. Elsevier will use the Unpaywall database to make open access content easier to find on Scopus.

Elsevier joins Clarivate Analytics, Digital Science, Zotero, and many other organizations as paying subscribers to the Data Feed.  Paying subscribers provide sustainability for Unpaywall, and fund the many free ways to access Unpaywall data, including complete database snapshots as well as our open API, Simple Query Tool, and browser extension. We’re proud that thousands of academic libraries and other institutions, as well as over 150,000 individual extension users, are using these free tools.

Impactstory’s mission is to help all people access all research products. Adding Elsevier as a Data Feed customer helps us further that mission. Specifically, the new agreement injects OA from our index into the workflows of the many Scopus users worldwide, helping them find and use open research they may never have seen before. So, we’re happy to welcome Elsevier as our latest Data Feed customer.

How do we know Unpaywall won’t be acquired?


Reposted with minor editing from a response Jason gave on the Global Open Access mailing list, July 12 2018.

We’re often asked: How do we know Unpaywall won’t be acquired?  What makes Unpaywall (and the company behind it, Impactstory) different than Bepress, SSRN, Mendeley, Publons, Kopernio, etc?

How can we be sure you won’t be bought by someone whose values don’t align with open science?

There are no credible guarantees I can offer that this won’t happen, and nor can any other organization. However, I think stability in the values and governance of Impactstory is a relatively safe bet.  Here’s why (note: I’m not a lawyer and the below isn’t legal advice, obvs):

We’re incorporated as a 501(c)3 nonprofit. This was not true of recently-acquired open science platforms like Mendeley, SSRN, and Bepress, which were all for-profits. We think that’s fine…the world needs for-profits. But we sure weren’t surprised when any of them were acquired. These are for-profit companies, which means they are, er:

For: Profit.  

Legally, their purpose is profit. They may benefit the world in many additional ways,  but their officers and board have a fiduciary duty to deliver a return to investors.

Our officers and board, on the other hand, have a legal fiduciary duty to fulfill our nonprofit mission, even where this doesn’t make much money. I think instead of “nonprofit” it should be called for-mission. Mission is the goal. That can be a big difference.  Jefferson Pooley did a great job articulating the value of the nonprofit structure for scholcomm organizations in more detail in a much-discussed LSE Impact post last year.

All that said, I’m not going to sit here and tell you nonprofits can’t be acquired…cos although that may be technically true, nonprofits can still be, in all-but-name, acquired. It’s just less common and harder.

So we like to also emphasize that the source code for these projects we are doing is open. That means that for any given project, its main asset–the code that makes our project work–is available for free to anyone who wants it. This makes us much less of an acquisition target. Why buy the cow when the code is free, as it were.

As a 501(c)3 nonprofit, we have a board of directors that helps keep us accountable and helps provide leadership to the organization as well. Past board members have included  Cameron Neylon and John Wilbanks, with a current board of me, Heather, Ethan White and Heather Joseph.  Heather, Ethan, John, and Cameron have each contributed mightily to the Open cause, in ways that would take me much longer than I have to fully chronicle (and most of you probably know anyway). We’re incredibly proud to have (and have had) them tirelessly working to help Impactstory stay on the right course. We think they are people that can be trusted.

Finally, and y’all can make up your own minds about this, I like to think our team has built up some credibility in the space. Me and Heather have both been working entirely on open-source, open science projects for the last ten years, and most of that work’s pretty easy to find if you want to check it out. In that time, it’s safe to assume we’ve turned down some better-paying projects that aligned less closely with the open science mission.

So, being acquired?  Not in our future.  But growth sure is, through grants and partnerships and customer relationships and lots of hard work… all in the service of making scholcomm more open.  Stay tuned 🙂

We’re building a search engine for academic literature–for everyone


Huzzah! Today we’re announcing an $850k grant from the Arcadia Fund to build a new way for folks to find, read, and understand the scholarly literature.

Wait, another search engine? Really?

Yep. But this one’s a little different: there are already a lot of ways for academic researchers to find academic literature…we’re building one for everyone else.

We’re aiming to meet the information needs of citizen scientists, patients, K-12 teachers, medical practitioners, social workers, community college students, policy makers, and millions more. What they all have in common: they’re folks who’d benefit from access to the scholarly record, but they’ve historically been locked out. They’ve had no access to the content or the context of the scholarly conversation.

Problem: it’s hard to access to content

Traditionaly, the scholarly literature was paywalled, cutting off access to the content. The Open Access movement is on the way to solving this: Half of new articles are now free to read somewhere, and that number is growing. The catch is that there are more than 50,000 different “somewheres” on web servers around the world, so we need a central index to find it. No one’s done a good job of this yet (Google Scholar gets close, but it’s aimed at specialists, not regular people. It’s also 100% proprietary, closed-source, closed-data, and subject to disappearing at Google’s whim.)

Problem: it’s hard to access to context

Context is the stuff that makes an article understandable for a specialist, but gobbledegook to the rest of us. So that includes everything from field-specific jargon, to strategies for on how to skim to the key findings, to knowledge of core concepts like p-values. Specialists have access to context. Regular folks don’t. This makes reading the scholarly literature like reading Shakespeare without notes: you get glimmers of beauty, but without some help it’s mostly just frustrating.

Solution: easy access to the content and context of research literature.

Our plan: provide access to both content and context, for free, in one place. To do that, we’re going to bring together an open a database of OA papers with a suite AI-powered support tools we’re calling an Explanation Engine.

We’ve already finished the database of OA papers. So that’s good. With the free Unpaywall database, we’ve now got 20 million OA articles from 50k sources, built on open source, available as open data, and with a working nonprofit sustainability model.

We’re building the “AI-powered support tools” now. What kind of tools? Well, let’s go back to the Hamlet example…today, publishers solve the context problem for readers of Shakespeare by adding notes to the text that define and explain difficult words and phrases. We’re gonna do the same thing for 20 million scholarly articles. And that’s just the start…we’re also working on concept maps, automated plain-language translations (think automatic Simple Wikipedia), structured abstracts, topic guides, and more. Thanks to recent progress in AI, all this can be automated, so we can do it at scale. That’s new. And it’s big.

The payoff

When Microsoft launched Altair BASIC for the new “personal computers,” there were already plenty of programming environments for experts. But here was one accessible to everyone else. That was new. And ultimately it launched the PC revolution, bringing computing the lives of regular folks. We think it’s time that same kind of movement happened in the world of knowledge.

From a business perspective, you might call this a blue ocean strategy. From a social perspective (ours), this is a chance to finally cash the cheques written by the Open Access movement. It’s a chance to truly open up access to the frontiers of human knowledge to all humans.

If that sounds like your jam, we’d love your support: tell your friends, sign up for early access, and follow us for updates. It’s gonna be quite an adventure.

Here’s the press release.

Why the name “altmetrics” doesn’t imply replacement of citations (and other bicycling metaphors)


“Based on the name “alternative” metrics, you clearly think altmetrics can replace citations. That’s dumb.”

I (Jason) have heard this critique more times than I care to count. And on one level, I get it. If  you take an “alternate route,” you don’t take the original route, you take a different one. There’s a replacement. And completely replacing citation metrics with altmetrics is, I agree, dumb. That said, I actually believe altmetrics should complement citation, and I further think that the name “altmetrics” (for all its flaws) is compatible with this view. To explain, here’s an example:

I’m currently looking out the window at a street which includes both a lane for cars, and another lane for “alternate transportation,” a category that includes bicycles, skateboards, and scooters.

Although these “alternate” vehicles have many advantages over cars (cleaner, smaller, etc) the goal of city planners is not, as I understand, to replace automobiles with alternate transportation. Rather, the goal is to make it easy for commuters to use the most suitable vehicle for their particular trip. This in turn supports a more efficient infrastructure for the city as a whole. Making it easy for commuters to choose alternate transportation for a given trip is helpful, even though no one really expects bikes to completely replace cars in the city as a whole.

(As an aside: these “alternate” vehicles could probably have some other, more descriptive name….for instance, “smaller-more-efficient vehicles.” However, as a practical matter, cars are the default for now so bikes etc remain “alternatives” for now. This is also true of altmetrics, of course, which I often hear will someday be obsolete as a term, once it really catches on. To this I say: excellent. The sooner the better.)

Like bikes et al, altmetrics aren’t right for every use case, and never will be. Altmetrics can’t and shouldn’t replace citation metrics for every task. But they are much better tools than citation metrics for some tasks (for example, understanding the impact of research on populations that don’t write scholarly papers). Therefore, using altmetrics alongside citations will let us measure scholarly impact more in a way that’s more efficient, nuanced, and comprehensive. Altmetrics are an alternative to the measurement gridlock that comes from over-reliance on citation metrics.

 

When will everything be Open Access?

Posted on


OA continues to grow. But when will it be…done? When will everything be published as Open Access?

Using data from our recently-published PeerJ OA study, we took a crack at answering that question. This data we’re using comes from the Unpaywall database–now the largest open database of OA articles ever created, with comprehensive data on over 90 million articles. Check out the paper for more lots more details on how we assembled the data, along with assessments of accuracy and other goodies. But without further ado, here’s our projection of OA growth:

growth-over-time

In the study, we found that OA is increasingly likely for newer articles since around 1990. That’s the solid line part of the graph, and is based on hard data.

But since the curve is so regular, it was tempting to extend it so see what would happen at the current rate of increase. That’s the dotted line in the figure above. Of course it’s a pretty facile projection, in that no effort has been made to model the underlying processes. #limitations #futurework 😀. Moreover, the 2040 number is clearly too conservative since it doesn’t account for discontinuities–like the surge in OA we’ll see in 2020 when new European mandates take effect.

But while the dates can’t be known for certain, what the data makes very clear is that we are headed for an era of universal OA. It’s not a question of if, but when. And that’s great news.

Open Access, coming to a workflow near you: welcome to the year of Ubiquitous OA

Posted on


Thanks to 20 years of OA innovation and advocacy, today you can legally access around half the recent research literature for free. However, in practice, much of this free literature is not as open as we’d like it to be, because it’s hard for readers to find the OA version.

OA lives on repositories and publisher websites. But very few people visit these sources directly to find a given article. Instead, people rely on the search tools that are already part of their existing workflows. Historically, these haven’t done a great job surfacing OA resources. Google, for instance, often fails to index OA versions, in addition to indexing content of dubious provenance. OA aggregators like BASE, CORE, and OpenAIRE aim to solve this by emphasizing OA coverage, but they require researchers to add a second or third search step to their existing workflows–something researchers have been reluctant to do.

So in addition to the well-known access problem, we also have a discovery problem. On the one there’s a healthy, efficient OA infrastructure in journals and repositories. On the other we have millions of individual readers doing their own thing. We need to connect these. We need to cover this last mile between the infrastructure and the individual user, and we need to make that connection easy and seamless and ubiquitous. Until we do, OA is writing a check it can’t fully cash.

But the news is good: over the last year, several efforts are emerging to cover that last mile. Our contribution was Unpaywall: an extension that shows a green tab in your browser on articles where there’s an OA version available.  Unpaywall has enjoyed lots of success, adding over 100,000 active users in under than a year. Moreover, the backend database of Unpaywall (formerly called oaDOI) can be integrated into any number of existing tools, making it easier to spread OA content all over the place. For instance, we’re already seeing over a million uses every day from library link resolvers.

Our most recent integration takes this to a new level, and we’re so excited about it: thanks to a new partnership between Impactstory and Clarivate Analytics, data from Impactstory’s Unpaywall database is now live in the Web of Science, making it the first editorially-curated and publisher-neutral resource to implement this technology. Web of Science has been able to use Unpaywall data to discover and link to millions more OA records amongst their existing content.  This enables millions of Web of Science users around the world to link straight from their search results to a trusted, legal, peer-reviewed OA version—and they can also filter search results by the different versions of OA.

This is a big deal because article and indexing (A&I) systems like Web of Science are currently the most important way researchers access literature.  And though it’s by no means the only A&I system out there, Web of Science is the most respected and most prevalent. Every month, millions of users access literature through Web of Science—and now, each and every one of them will see more OA options for articles they might not otherwise discover, right alongside subscribed content.  Every day. What a huge change from the days we had to convince folks that OA was legitimate at all! It’s a new era.

A new era: that’s not just a hyperbolic phrase. We think this year marks the turning of a new moment in the OA narrative. We’re moving out of the author-focused, advocacy-focused initial phase, and into a more mature era of ubiquitous Open Access, characterized by deep integration of OA into researcher workflows and value-add services built on top of the immense OA corpus. This is the era of user-focused OA.

As OA becomes the default state for published research, tools that centralize, mine, index, search, organize, and extract knowledge from papers suddenly become massively more powerful.  Integrations between Unpaywall and commercial services aren’t generating this new era, but they are one of the hallmarks of it. We’re not making new OA, but rather starting to leverage the massive OA corpus now available. In the last year, many others have begun to do this as well. Many, many more will follow

For years, we in the OA advocate community have been arguing that a critical mass of OA would not just improve scholarly communication, it would transform it. This is finally beginning to happen, and we think this partnership with Web of Science is an early part of that transformation. Now, a subscription to Web of Science—something most academic libraries globally already have—is also a subscription to a database of millions of free-to-read OA articles.

We’ve never been more excited about the future of OA–or more thankful for all the work the OA community as a whole has done to get here. And we can’t wait to keep working together with the community to help make the vision of ubiquitous open access a reality.