agile meets distributed open source development

Total-impact will always be open-source.  It is a pretty standard OSS project in its early stages: its core developers are geographically distributed and contributions have been fueled by enthusiasm rather than paycheques.  Producing Open Source Software gives a great overview of standard processes for these sorts of projects.

At the same time, Jason and I (Heather) are sold on the principles behind agile development: iteration, adaptation, tight feedback loops, simplicity.

OSS and agile methodologies have many similarities, but some differences.  In particular, agile development is practiced most often by co-located, dedicated-coding teams.  As a result, we’ve been rolling our own process a bit.  It is working well: it feels good, we aren’t spending too much time on process, and we recognize and change things when they aren’t working.

For example: we were keeping our sprint backlog in a google spreadsheet. Last sprint we moved to tracking sprint items as GitHub issues.  Although this makes it harder to do time estimation, it is easier to integrate into our workflows… currently a win.

Here is what our development process looks like right now:

  • two-week sprints.  We practice a little more flexibility in pre-determined scope than Scrum agile.
  • developer conversation take place openly on the newly-formed total-impact-dev google group
  • weekly Skype calls with active developers (Richard, Mark, Heather, Jason) for start/mid/end sprint conversations
  • sprint issues in GitHub
  • product backlog in a google spreadsheet
A few things are lacking: good ways to get customer feedback is a main one! We’ll get there soon.
Sound fun?  It is 🙂  Join us!

latest Sloan grant revision

We’ve submitted a revision to our Sloan Foundation grant in response to comments and feedback from them, and to reflect some updated ideas we’ve had.

The biggest change is the budget. I’m close to full-time already because TI is my dissertation. But we’ve boosted Heather’s grant salary to the point where she’d only be 50% supported by her current postdoc, with the other half by the TI grant.

(Update: we received the grant! Read all about it here.)

12-month goals

As part of our Sloan Foundation grant process, we were asked to come up with some measurable outcomes. This ended up being a really valuable exercise, and I anticipate we’ll be checking back with these pretty regularly.

We expect not only to reach these goals by April 2013, but also that our chosen metrics will be increasing across the board. Here they are:

  • overall visibility: (50k visits, 30k unique visitors, 500 tweets, 30 blog posts, 60 github watchers, 20 forks)
  • scholars: embedding or linking to TI reports on their homepage/CV (n=100), some of whom present these in annual reviews or T&P packages (n=25)
  • publishers, repositories, and tools: embedding the total-impact widget on articles/datasets (15 organisations)
  • researchers: gathering data for research studies using TI (5 in-progress or published papers)

Of course, in keeping with our open and agile approach, we’ll likely end up modifying these some in response to experience and feedback from the community (if you’ve got ideas on how to improve these, we’d love to hear ‘em). But we reckon they’re a pretty good start.

What are metrics good for?

We talk a lot about metrics. And when you do that, there’s always the risk what you’re measuring or why will become unclear. So this is worth repeating, as was reminded in a nice conversation with Anurag Acharya of Google Scholar (thanks Anurag!).

Metrics are no good by themselves. They are, however, quite useful when they inform filters. In fact, by definition filtering requires measurement or assessment of some sort. If we find new relevant things to measure, we can make new filters along those dimensions. That’s what we’re excited about, not measuring for it’s own sake.

These filters can mediate search for literature. They can also filter other things, like job applicants or or grant applications. But they’re all based on some kind of measurement. And expanding our set of relevant features (and perhaps a machine-learning context is more useful here than the mechanical filter metaphor) is likely to improve the validity and responsiveness of all sorts of scholarly assessment.

The big question, of course, is whether altmetrics like tweets, mendeley, and so on are actually relevant features. We can’t now prove that one way or another, although we’re working on it. I do know that they’re relevant sometimes, and I have the suspicion that they will become more relevant as more scholars move their professional networks online (another assumption, but i think a safe one).

And of course, measuring and filtering are only half the game. You also have to aggregate, to the pull the conversation together. Back when citation was the only visible edge in the network, we used ISI et al. to do this. Of course the underlying network was always richer than that, but the citation graph was the best trace we had. But now the the underlying processes—conversations, reads, saves, etc—are becoming visible as well, and there’s  even more value in pulling together these latent, disconnected conversations. But that’s another post 🙂

total-impact at SPARC OA

I (Jason) presented TI at the SPARC Open Access Meeting in Kansas City last week. It was an interesting event, with a mix of in-the-trenches librarians, publishers, institutional repository folk, and business people representing the growing range of  products and services springing up around Open Science. I found the engagement and growth of this latter group encouraging, since it’s where we see TI ending up.

There was ample skepticism about TI: bit publisher reps were very interested, but non-committal, giving the sense that they want to see a more stable legal/org entity before they take the SaaS plunge. Many librarians had initial “it’s a toy, my faculty care only for IF” reactions, although these tended to thaw after more explanation. Both reactions underscore for me the importance of 1) establishing a trustworthy legal identity for TI and 2) continuing to do outreach and research around the idea of altmetrics in general.

There was a lot of encouraging enthusiasm, as well. The TI poster was mobbed. Several repositories expressed heavy interest in embedding TI stats, and some libraries were interested in contributing plugins. Was great to hear folks say “someone isfinallydoing this…it’s just what we’ve been wanting!”

Another highlight was a great chat with John Wilbanks; the more I hear him talk, the more his open-sci insight and knowledge impresses me. Turns out he’s been keeping well abreast of TI and altmetrics, and has good things to say about total-impact’s future prospects, which was great to hear. I also got a chance to talk with smart folks from the Kauffman Foundation at a dinner they set up; they had thoughtful things to say about the Value Of Entrepreneurship, and why a for-profit startup could be the best structure for even a mission-driven project like us.

So, overall an educational and useful event. Thanks to SPARC for putting it on!

What’s the market for total-impact?

We’ve been thinking a lot lately about how TI can make money to support infrastructure and future development. We’re still unsure what kind of organisational structure will best serve the goals we’ve got for TI (a conventional startup, a B Corp, a non-profit foundation, or something else?). But we’re increasingly sure that there’s a clear business model to support whatever we come up with, built around selling high-volume access to the data we collect.

In the short term, TI is nicely positioned  at the intersection of social-media analytics (Gnip, Radian6, etc) and scholarly impact (ISI, Scopus, etc). The great things about this confluence from a market perspective are:

  • There are a bunch of special academic-specific sources (insulating us from competition from Gnip et al).
  • The institutional inertia of big players like the ISI and Scopus makes them unlikely competitors in the short- to medium-term. They see this stuff as toys. Once they don’t, it’ll take time to develop relationships with many providers, relationships we’ve been building for the last year.
  • As budgets tighten, there’s a growing clamor for more and better metrics to support  funding applications.  There’s growing dissatisfaction with the IF, but still a culture comfortable with (addicted to?) making decisions based on popularity in social sharing networks (which the literature, at its core, is).

Of course there are challenges here; the IF is a very well-established brand, and citations are the coin of the realm. But there’s also a growing sense that the Web opens exciting possibilities for scholcomm that we scholars are letting slip away.

The scholarly impact market—which ultimately drives the whole scholarly enterprise—is based on 1960s data sources and 1960s technology. It’s ripe for disruption. And selling this new data with a low-risk SaaS model is the perfect way to get that disruption started.

Over the long term, sources like TI can be the infrastructure upon which all of science is built.  Future researchers won’t read journals, but rather customised feeds, filtered using data gleaned from their social networks. As a provider of that data, total-impact sits atop a powerful revenue stream, as well helping to push science into the future.

total-impact gets £17,000 support from OSF

total-impact has come full circle: we were born out of a hackathon thrown by the Beyond Impact project, funded by the Open Society Foundations. Now we’re being generously supported from that same Beyond Impact grant, helping us move from prototype to a reliable, scaleable service.

The budget on the grant is pretty simple: £16k goes to open-source devs Cottage Labs to help build out Jean-Claude, our next release. The other £1k flew me here to Edinburgh to work with CL for a week. There are more details in our grant application; in keeping with our radical transparency philosophy, we are posting that as a gDoc here, so you can see more specifics.

It probably goes without saying that we’re really excited about this grant…it’s a great chance to work with Cottage Labs, a great vote of confidence from Beyond Impact, and a great push toward our goal of revolutionizing the dissemination and evaluation of scholarship.

Special thanks to Cameron Neylon for his vision and leadership in setting up the original workshop, for suggesting we apply for funding, and for helping us along when we did.

Jean-Claude tools

Here are the main libraries and other tools we’re using for Jean-Claude. Some are new for TI (Flask) some are old (Couch) and all are awesome:

  • Python 2.7
  • jQuery on the UI
  • CouchDB with the Python couchdb lib.
  • Flask as the lightweight web framework
  • BibJSON to describe bibliographic information
  • Supervisor to run and manage server processes
  • Beautiful (Stone) Soup for parsing xml and html
  • The Negotiator for content negotiation in the API
  • Memecached or maybe Redis for caching
  • JSONP? for serving API data