InterMine Mobile

Tag: InterMine

BlueGenes development is at the point where we need to store BlueGenes specific data to a database. This is an important step because it paves the way for customisation, branding, and tool configuration, and an enhanced My Data section to let users manage all of their InterMine assets.

There are a few architecture and design decisions that need to be made now, and be made correctly. In particular: OAuth2Authentication. If you’re up to speed on how InterMine and BlueGenes authenticate then feel free to skip to the bottom.

Background

The current InterMine web application is a monolith. Users login to the UI with a username and password and their identity gets stored in memory on the server (called the “session”). When they perform a query or upgrade a list the JSP code sends messages to the Java layer along with the user’s identity which is used to retrieve data from the object store and user profile.

Figure 1

Everything you see in InterMine today lives somewhere layered between the JSP Web App and the Object Store.

BlueGenes works differently. It communicates with the Java layer, object store, and user profile entirely through web services known as the InterMine API. No exceptions. This cleaves the dependency between the visual tools that we develop and the lower level operations of InterMine such as handling queries.

When Sally views her list page in BlueGenes, the workflow looks more like this:

Figure 2

BlueGenes lives in the browser, not on the server. InterMine’s web services respond with raw data about her lists in JSON format and BlueGenes renders the page in the browser. This is equivalent to running Python scripts in your console to fetch your lists, resolve IDs, perform a search, etc.

Web services (InterMine or otherwise) are stateless by design. They can’t tell if requests are made by a new user or a revisiting one. In order for a web service to authorise a user the request must contain some sort of secret token as seen in Figure 2. Like any good web application, InterMine provides web services for authenticating a user and retrieving their identity token which can be used in future requests rather than a username and password.

BlueGenes Authentication

Now it gets a bit trickier. BlueGenes has its own small web server to provide the actual javascript application, and it requires database access to store BlueGenes specific information such as additional MyMine data, tool config, etc. It really looks more like this:

Figure 3

A user can authenticate using InterMine’s web services via the browser, but if they want to save user specific data to BlueGenes’s database using BlueGene’s web services then they need to provide an identity. BlueGenes does not have access to the user profile directly, so the authentication request needs to be piped through the BlueGenes server.

Figure 4

When Sally logs into BlueGenes she provides her username and password which is sent to the BlueGenes server rather than the InterMine server. If BlueGenes successfully authenticates as Sally then it sends her back her InterMine API token embedded in a signed JSON Web Token (JWT). All future requests between BlueGenes and InterMine will contain her API token, and all requests to the BlueGenes server will contain the signed JWT.

It sounds a bit complicated, but this only happens when logging in and remains hidden from the user. This configuration protects BlueGenes from storing passwords and doesn’t require direct access to the user profile.

The problem: OAuth2 Authentication

Logging into InterMine using your Google account uses the OAuth2 framework. For it to work you must configure Google’s developer console with a hardcoded URL that redirects users back to the application after they’ve authenticated. This redirection page is given a token that is exchanged by the servers for the user’s Google identity (email address and Google ID). We can do the same in BlueGenes:

We put a Google Signin button in BlueGenes.

Sally clicks it and is redirected to Google.

Upon authentication Sally is sent back to BlueGenes with an authentication token.

BlueGenes server exchanges the token for Sally’s Google ID.

So far so good. She can update her tool configurations and tags which are stored in the BlueGenes database.

Now Sally wants to save a list which is an action performed in InterMine, not BlueGenes. This requires an API token which she doesn’t yet have.

She can’t authenticate with InterMine using a username and password because she doesn’t have one (she’s a Google user).

She has no way of exchanging her Google ID with InterMine’s web services for an API token because InterMine has no way of trusting who she is. Anyone could access the end point and get a user’s API token if they knew their Google ID.

BlueGenes can’t fetch her API token from the user profile because it doesn’t have access (by design).

There are a few workaround solutions but they couple BlueGenes to a single InterMine instance with varying degrees.

Solution 1: JWTs and sharing secrets

InterMine server gets a new end point that accepts a user ID and a JSON Web Token. The user’s API token is returned only if the signature on the JWT is valid.

Pain point: Both BlueGenes server and InterMine server will need matching secret keys. A third party cannot host their own BlueGenes and point it at a remote mine while supporting OAuth2 without knowing that mine’s secret key (aka access to all accounts).

InterMine admins could potentially whitelist third party instances of BlueGenes by generating secret keys for them, but this would be an active process of curation and still give third parties full access to all Google accounts..

InterMine has a URL redirect for Google authentication. It accepts a URL of a BlueGenes instance and generates a link with an embedded API key.

A user clicks Google Login on BlueGenes and is redirected to Google

After authenticating the user is redirected back to the BlueGenes server.

BlueGenes generates a JWT containing the user’s identity.

A mandatory button is then shown to “Authorise My Account to use Remote Data Sources” (which means InterMine server).

Clicking the button sends the user to a /service/google-auth end point on the remote mine with a return_to parameters containing the URL of BlueGenes.

The return_to parameter is stored in the session and the user is sent back to Google Login where they authorise for the second time.

After authenticating the user is redirected to an InterMine /service/google-auth-redirect end point.

The /service/google-auth-redirect page automatically redirects the user back to the BlueGenes URL stored in the session with the API token as a parameter

A workflow would look something like this:

There are quite a few steps, but steps 5+ are automatic.

Pain point: Users will have to double authentication the first time they login to Bluegenes, but we can make this as painless as possible. Also, if an admin is running both InterMine server and BlueGenes server then they’ll need two OAuth2 projects in their Google developer console (also a one time activity).

Solution 4: Outsource

We use a third party single sign-on vendor such as https://auth0.com/

Pain point: We can’t guarantee that InterMine admins will remain within the Terms of Service for their free offering to open source projects. Otherwise it’s very expensive.

Solution 3 seems to be the most feasible and keeps InterMine and BlueGenes completely decoupled. (Thanks, Yo!)

2017’s developer conference has been and gone; time to pay my dues in a blog post or two.

Day 0: Welcome dinner, 29 March 2017

The Cambridge InterMine arrived at Walnut Creek without a hitch, and after a jetlagged attempt at a night’s sleep we sat down to a mega-grant-writing session in the hotel lobby, fuelled by several pots of coffee and plates of nachos.

By 7PM, people had begun to gather in the lobby to head to the inaugural conference dinner at the delicious Walnut Creek Yacht Club. We had to change the venue quite late on in the game, meaning we decided to wander down the street to collect some of the InterMiners who had ended up at the original venue (sorry!!). By the end of the meal, most of the UK contingent was dead on their feet – 10pm California time worked out to be 6am according to our body clocks, so when Joe offered to give several of us a lift back to the hotel, it was impossible to decline.

Day 1: Workshop Intro

The day started with intros from our PI, Gos, and our host, David Goodstein.

Short community talks

Joel gave a great presentation about Doppelgangers in InterMine – that is, occasionally, depending on your data sets and config, you can end up with duplicate or strange / incomplete InterMine objects in your mine. He follows up with explanations of the root causes and mitigation methods – a great resource for any InterMiner who is working in data source integration!

Next up was Sam’s talk about his various beany mines, including CowpeaMine, which has only genetics data, rather than the more typical InterMine genomic data. He’s also implemented several custom data visualisations on gene report pages – check out the slides or mines for more details.

Vivek focused on some great cross-InterMine collaborations (slides here), including the technical challenges integrating JBrowse into InterMine, as well as a method to link to other InterMines using synteny rather than InterMine’s typical homology approach.

Joe has the privilege to run the biggest InterMine, covering (currently) 72 data sets on 69 organisms. Compared to most InterMines, this is massive! Unsurprisingly, this scale comes with a few hitches many of the other mines don’t encounter. Joe’s slides give a great overview of the problems you might encounter in a large-scale InterMine and their solutions.

Joe talks about how PhytoMine handles having multiple versions of the same genome – not something InterMine natively handles. pic.twitter.com/hL40IdGbih

Better Findablility (the F in FAIR) by registering InterMine resources with external registries

RDF generation / SPARQL querying

This was followed up by Daniela’s introduction to RDF and SPARQL, which provided a great basic intro to the two concepts in an easily-understood manner. I really loved these slides, and I reckon they’d be a good introduction for anyone interested in learning more about what RDF and SPARQL are, whether or not you’re interested in InterMine .

If so, who is involved? Developers, community members, curators, other?

Homologue or homolog? Who knew a simple “ue” could cause incompatibility problems? Most InterMine use the “ue” variation, with the exception of PhytoMine. An answer to this problem was presented in the “friendly mine” section of Vivek’s talk earlier in the day.

Another great output was Siddartha Basu’s gist on setting up InterMine – outlining some pain points and noting the good bits.

Most of us met up for dinner afterwards at Kevin’s Noodle House – highly recommended for meat eaters, less so for veggies.

This went well despite a server-room meltdown which conveniently timed itself for the morning of the same day (the training session was in the afternoon, so we thankfully had time to get the servers back up!).

In contrast to previous years, every single hand went up when we asked if the participants wrote code as part of their job. Next time, we will try to allow for a longer session on using InterMine web services, rather than the 15 minute slot we allocated this time!

Developer Workshop and Hackathon: 5 days in sunny California, spending time with InterMiners from around the world. Longer blog posts to follow, but in the meantime you can browse the agenda for links to slides from each session, or the storify summary of tweets.

Google Summer of Code: We’re participating in Google Summer of Code (GSoC) this year (previously) as a mentoring organisation. We had over 50 interested students and 30 distinct applications, many of which were simply brilliant. The deadline for students applying, naturally, was the day after the hackathon, making finding time to provide student feedback a challenge. Maybe there’s a reason to be grateful for jet-lag induced wakefulness at odd hours!

Grants: A tale of two grants… :

New application: We had a grant application deadline that was, once again, the day after the hackathon. Uh-oh! Feverish figure fixes, tentative typo tweaks and word-count winnowing was squeezed in at every opportunity.

Good news about an old application: Meanwhile, we got the news that we’d been fortunate enough to have our hard work pay off: a grant we’d applied for last year as part of the BBSRC BBR 2016 call was agreed to! Hint: the future of InterMine is looking very FAIR, possibly even SPARQLing. More details in a later post.

We’ve decided to streamline our blogging experience a little bit. Rather than maintaining several separate but mostly similar blogs for HumanMine, FlyMine, and InterMine, this blog will act as a combined stream.

Don’t worry – this doesn’t mean you’ll be forced to view irrelevant updates if you’re only interested in one of the sub categories. WordPress is great about filtering via tag or category. Here are a few quick links:

We have a brand new blog and so would like to take this opportunity to tell you our grand plans for 2016.

InterMine 2.0

Gradle

Currently InterMine is built with a series of ant commands, and dependencies are managed manually. This of course is not ideal, and we plan to use Gradle to replace Ant and manage our dependencies automatically. This change will make builds faster, easier and more efficient.

For those of you with InterMines of your own, this means that you will use different commands for building your databases and deploying your webapps. We’ll provide the new commands along with documentation, and aim to make the transition as easy as possible.

Keyword Search

We currently use Lucene for our search index but plan to greatly expand our utilisation of this great library — making search on InterMine more robust, sensitive and powerful.

The Cloud

Some have already deployed their InterMine to the cloud. We intend to make this process much easier, probably by creating a custom InterMine buildpack which pre-configures a Docker container with all of InterMine’s dependencies.

New Data Sources

We are always adding new data sources and would like to hear your suggestions. On our list right now is:

And of course we will continue to update our current data source library as file formats and data change.

New User Interface

We’ve developed a new user interface which should be ready for beta testing in early 2016. It’ll exist alongside the current interface for some time, allowing you to feed back ideas, suggestions, and critiques in the new interface, whilst still being able to rely on the old one.

Here’s a sneak preview (subject to plenty of change, of course!):

Sneak preview: Homepage for the (work-in-progress) Intermine 2.0 UI.

New Tools

To go along with our new interface, we’re going to be adding a lot of new tools for you to use. Our wish list so far (not in order of priority):

Advanced Search / Query builder / Guided search

Recommendation engine (which gene is like my gene?)

Complex Interaction viewer

more powerful region search

phenotype viewer

InterMine search tool

R plug-in

Text mining tool

JBrowse / other genome browsers

UniProt protein browser

…

We’d like to hear which tools are important to you. We also will improve the tools we currently have, making them easier to adapt to your data sets.

2017 and beyond

Genomes are being sequenced every day, technology is moving at an ever more rapid pace and everyone is facing a challenging funding environment. We don’t know quite what the world will look like in the next five years but we are working hard to be future proof. We’ve always had a deep commitment to openness, flexibility and collaboration, and feel that this will help us meet any future challenges.

Towards this end, we are running a pilot program to test out various graph databases and to explore the semantic web. We will keep you posted on our progress as always, and would like to hear your thoughts.

Thanks to our great community for all of their support over the years! We look forward to a really exciting year!