Reader Comments

Post a new comment on this article

One 'killer' app is already here.

Posted by scekker on 07 Apr 2008 at 16:46 GMT

Dear Dr. Bourne,

I could not be in better agreement with you on the long-term potential of open access - that is, the ability to connect to the entire content of an article.

I want to mention that there IS a fantastic and 'killer' app already commercially available that, at least locally, can mine this data - and readily works with pdf documents. It's called 'spotlight', and comes as a part of the mac OS X operating system.

If you have not used it, I recommend you borrow a mac, load up 100 of your favorite articles as pdf reprints, and then start to mine that database using spotlight. It's amazing. Gone are the days where I had to keep my pdf documents religiously named as author/date. Instead, I can search all articles that have specific scientific content. It works well with gene names as queries, because 'pax-2' does not tend to be found in other content on my computers besides grants and emails.

If 'all' we had was spotlight for PubMed Central, a lot of your dream would become a reality. In some ways, having spotlight on my Mac is really window shopping as it only taps into my local database - who knows when this technology will be deployed across an archive of all our research articles.

But I look forward to that day.

RE: One 'killer' app is already here.

philip_bourne replied to scekker on 13 Apr 2008 at 17:58 GMT

Hi:

In Windows XP you can do the same thing. The difference, as I understand it, is that Spotlight indexes things in advance so it is faster and prettier to use but all it does is a full text search.

If the dream is to have fast full text searching over PDFs and things, the task is very simple. Buy a Google Search Appliance (http://www.google.com/ent...) and have it index PMC or whatever else desired. And you have your super-fast full-text searching over PDFs, Word docs, etc...

But the moment you want something more advanced than "find me things that contain the string 'pax-2'", such as "find me things that talk about the PDB id 1q11 and not things talking about the 1q11 chromosomal region" or "find me papers by authors that site the Wiggle paper talking about X" you will need something more advanced. Perhaps I am missing something here?