20 comments to JabRef + automatic metadata extraction from PDF files (like Mendeley)

  • Nikolay Karelin

    It’s frustrating:

    Why do you create customized JabRef, not a plugin. It will require additional efforts from JabRef developers to decide and intergrate… By the way, have you contacted JabRef team discussing your update.

    The other thing is the meta-data extracting server. You see, not all research team always have good internet connection, and what’s more important, some teams are simply not allowed to share their paper collection with 3rd parties. There is rather interesting ‘queue of requests’ on Zotero site – http://forums.zotero.org/discussion/3574/ – lots of requests and just a few answers about local server.

    • hello,

      we are talking with the JabRef team and it looks that our changes will be integrated into the official jabref. so there is no further need for a plug in or special version.

      I understand that you would prefer having the metadata being extract locally. But:

      - Our tools for analysing PDFs are not platform independent and
      therefore could not be easily integrated into JabRef.

      - We are constantly improving our algorithms (for instance, in the
      last 48 hours we made one bug fix and one other improvement). With a
      webservice we just have to change our tools on the server and that’s
      it. If our tools would be directly integrated into JabRef we would
      have to release new versions of JabRef every few days.

      - We have not developed our tools for JabRef only but for our project
      Mr. dLib and the main concept of Mr. dLib is to offer metadata and
      services for academic websites and tools via a webservice. So we
      have little intention to release some libraries etc. for which we
      would have to write detailed documentations etc. before others could
      use it. Instead we will offer an easy to use webservice that can be used by JabRef and others.

  • Hi, it’s William Gunn from Mendeley. We’re sorry to hear that Mendeley doesn’t work well with the Sciplore MindMapping tool, but there’s no point in re-inventing the wheel. I’m just thinking now that if we worked together we could add our strength in reference extraction and you could continue to focus on what you do best. Would you like to drop me an email to discuss this further?

    • Hello William,

      thank you for your interest in SciPlore MindMapping and our webservice. There are two problems SciPlore MindMapping has with Mendeley:

      1. Mendeley is using a proprietary data format for references. I know, Mendeley can automatically create a BibTeX file but SciPlore MindMapping cannot write into this BibTeX file because it would be overwritten by Mendeley the next time.

      2. Mendeley is using a proprietary data format for PDF bookmarks, so SciPlore MindMapping cannot import these bookmarks.

      Because of these reasons we recommend JabRef and we do have the tools for extracting metadata anyway because of our search engine http://www.SciPlore.org. So it is no big effort to integrate it into JabRef for us.

      • Stephen Fujiwara

        I’ve recently started using both SciPlore and Mendeley.

        I think it would be great if both teams worked together!

  • Andrewp

    I have been using Sciplore Mind Mapping to manage PDF files for my Masters Degree for the past year. I have found that the reality is very few pdf files from the various online databases have metadata, not even the title. So I am not sure how useful this will be for those in non Computer Science fields. What I do is:

    1. Download pdf files that I find from online databases into a directory that is monitored from my Sciplore Mind Map
    2. Change the pdf filename into the title of the pdf paper (the document title)
    3. Create bookmarks as needed within the pdf file
    4. Download the citation information for the pdf that is usually available on the online database, and import it into Endnote for referencing.

    Now what would be useful is:

    1. If Sciplore Mindmap could retrieve the title field from those pdfs which do have it in metadata, and display that title in the mindmap rather than the original filename, this would save step 2 above which can be a pain with long titles.

    2. If the same downloaded file that contains the citation information (for importing into Endnote) could also be drag and dropped into the relevant pdf in the MindMap, that would be sensational.

    While I would consider moving from Endnote to JabRef if it could give me additional functionality over Endnote, it seems to me that Endnote is fully featured, intuative and straightforward to use (and the uni provides it for free) and so it would need to be a compelling reason.

    • Thank you for your suggestions. Regarding the metadata extraction: I am not talking about XMP metadata (metadata written by the online databases into the file). I am talking about extracting the data directly from the PDF’s fulltext. This can be done with any PDF which is not a scanned image but containes “real” text. Accordingly, we will be able to provide at least some metadata for almost any PDF. Of course, our algorithms are far from being perfect but I estimate that in 70-80% the title is extracted correctly and in about 50% the author(s) and the abstract.

  • Costas

    There is a stand alone program that does similar thing, though it does require input from the user. It reads the pdf itself and allows u to choose / correct the output.


    it is free as well!

    I dont know how well ur solution will work, some pdfs are quite large and imagine if 100s of users try to upload theyr 1000s of pdfs at the same time!

  • Airbird


    There is no “your version of JabRef” available for the link you provided in this article.

    Another question, the video you provided does show very convenient features for big PDF database, but in reality there is no such functions in the latest version. What we can expect for the future? Do you have any fixed date for the full implementation of those functions?

    Thanks a lot for this nice work!

  • carly


    I tried to use your version of JabRef, everything works fine until after the pdf was detected by Mr.drlib.
    in the mr.dlib window all data is displayed correctly, but the generated BIBTEX key contains weired characters only.

    Also I trief Jabrev 2.7B, how is the function activated?

    Feel free to answer in german/english.

  • Nicky

    I cannot get the pdf metadata import to work with your JabRef modified version. I get stuck to step 2, When I drag & drop a pdf file into the JabRef table, nothing happens..I am running Kubuntu 10.10. Is there something to d o withe the SciPlore/Freemind configuratuion ?

    When starting JabRef from terminal and after an import failure, I get this :

    Exception in thread “AWT-EventQueue-0″ java.lang.NoClassDefFoundError: freemind/controller/MindMapNodesSelection
    at net.sf.jabref.groups.EntryTableTransferHandler.importData(EntryTableTransferHandler.java:152)
    at javax.swing.TransferHandler.importData(TransferHandler.java:772)
    at javax.swing.TransferHandler$DropHandler.drop(TransferHandler.java:1495)
    at java.awt.dnd.DropTarget.drop(DropTarget.java:446)
    at javax.swing.TransferHandler$SwingDropTarget.drop(TransferHandler.java:1220)
    at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(SunDropTargetContextPeer.java:529)
    at sun.awt.X11.XDropTargetContextPeer.processDropMessage(XDropTargetContextPeer.java:183)
    at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(SunDropTargetContextPeer.java:842)
    at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(SunDropTargetContextPeer.java:766)
    at sun.awt.dnd.SunDropTargetEvent.dispatch(SunDropTargetEvent.java:48)
    at java.awt.Component.dispatchEventImpl(Component.java:4419)
    at java.awt.Container.dispatchEventImpl(Container.java:2163)
    at java.awt.Component.dispatchEvent(Component.java:4390)
    at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4461)
    at java.awt.LightweightDispatcher.processDropTargetEvent(Container.java:4196)
    at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4050)
    at java.awt.Container.dispatchEventImpl(Container.java:2149)
    at java.awt.Window.dispatchEventImpl(Window.java:2478)
    at java.awt.Component.dispatchEvent(Component.java:4390)
    at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:649)
    at java.awt.EventQueue.access$000(EventQueue.java:96)
    at java.awt.EventQueue$1.run(EventQueue.java:608)
    at java.awt.EventQueue$1.run(EventQueue.java:606)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:105)
    at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:116)
    at java.awt.EventQueue$2.run(EventQueue.java:622)
    at java.awt.EventQueue$2.run(EventQueue.java:620)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:105)
    at java.awt.EventQueue.dispatchEvent(EventQueue.java:619)
    at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:275)
    at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:200)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:190)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:185)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:177)
    at java.awt.EventDispatchThread.run(EventDispatchThread.java:138)
    Caused by: java.lang.ClassNotFoundException: freemind.controller.MindMapNodesSelection
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
    … 37 more

  • filippo

    Hi, please, can you let us know at which point are we with the possibility to interact with Mendeley from Sciplore?
    Your answer will make me decide if I need to invest time on this or not.

    • Mendeley can continuously export a BibTeX file which can be read with SciPlore (there are some bugs which will be fixed in the next Beta). Other interaction such as Drag&Drop from or to Mendeley from SciPlore MindMapping is not possible at this moment. But I like the idea and will ask the Mendeley team if they are interested in cooperating in this point.

  • Fred

    Hi there,
    How’s it going the jabref and metadata extraction code?

  • Tim

    I just started using JabRef, but cannot get it to load a pdf file at all. When I try to drag and drop it just gives me the empty sign. When I used the load pdf ps command, it says “No pdf or ps defined” even though it didn’t offer an option by going to a windows file load screen. When I try to load a file, there are only two options “bib” and “all files” no pdf or ps? I need help badly thanks.

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>