This session was presented by Margeaux Johnson, Nicholas Rejack, Alex Rockwell, and Paul Albert.
Margeaux started by talking about VIVO’s origins. It is not launched completely yet, but is being used and tested at many institutions. It helps researchers discover other researchers. It originated at Cornell and was made open source. It was funded by a $12.5 million grant. It is constituted of 120+ people at dozens of public and private institutions. VIVO harvests data from verified sources like PubMed, Human Resources databases, organizational charts, and a grant data repository. This data is stored as RDF and then made available as webpages. VIVO will allow researchers to map colleagues, showcase credentials and skills, connect with researchers in their areas, simplify reporting tasks, and in the future will self-create CVs and incorporate external data sources and applications. So why involve libraries and librarians? Libraries are neutral trusted entities, technology centers, and have a tradition of service and support. Librarians know their organizations, can establish and maintain relationships with their clients, understand their users, and are willing to collaborate. There is a VIVO Conference here in DC in August, where you can learn a ton more.
Nick then talked about why the semantic web was chosen for this project. The local data flow in VIVO is relatively simple. And a cool feature allows all 7 operational VIVOs connecting with each other, somewhat similar to a federated search technology. Because the data is authoritative, they use URIs to track data about individual people within the system.
Paul then covered how VIVO ontology is structured. The data in VIVO is stored using Resource Description Framework. A sample semantic representation of the system’s data was displayed, connecting people who wrote articles together. VIVO can create inferences for you as well. Different ways of classifying data: Dublin Core, Event ontology, FOAF, Geopolitical classifications, SKOS, BIBO. Several very complicated charts were displayed showing how different data in VIVO is connected. So for modeling a person, you’re going to have the person’s research, teaching, services, and expertise in their data set. Different localizations are required by different institutions. He described how to create localization in VIVO, but gave the caveat that this functionality will not necessarily work across institutions. He recommends a book entitled Semantic Web for the Working Ontologist .
Nick talked about the importance of authoritative data in VIVO, of preserving the quality of the data. There are many different kinds of data: databases, CSV, XXML, XSLT, RDF, etc. These all go through a loading process. Load the desired ontologies. Upload the data into VIVO. Map the data to the ontology. And finally go through data sanitation to fix the mistakes and inconsistencies.
Alex concluded the session by talking about the ins and outs of VIVO. How do you work with VIVO data? The easiest way is to crawl the RDF. You can also utilize SPARQL queries. The University of Florida doesn’t have a facility to create organization charts. What they do have is in different types of inaccessible formats. So they hand-curated the charts, and when Alex wrote the program to handle this there were 500 and now there are over 1000 people in the program. The design includes a data crawl, serialization, formatting, and then exports into text, graph visualization, etc. VIVO also has a WordPress plug-in that exports data into WordPress sites and blogs. Cornell had a Drupal site, and a module for import of ViVO data was created. They’re working on developer APIs to expose VIVO data as XML or JSON, to install a SPARQL file, etc. He also created an application called Report Saver which lets you enter a SPARQL query, save it, and pull out data on a regular basis for analysis.