Current Projects of Prof. Dr. Piek Th.J.M. Vossen
BiographyNed: (From the Biography Portal of the Netherlands to a Virtual Society of People Past and Present): VU/UvA eScience-project (2012-2016)
The Biography Portal of the Netherlands links a wide variety of Dutch online reference works and data sets, written in different times and from different perspectives, through a limited number of metadata. This project aims to enhance its research potential for historical research on the portal’s ‘virtual community’ of the more than 100.000 Dutch people mentioned in the various linked databases. What historical narrative can be generated from digital access to these dictionaries of biography, lexicons and portrait collections? The lead question for the design of a demonstrator is: how can we generate relationships between peoples and events,geographical movements of and networks between people from the Biography Portal and what do they tell historians about the formation of Dutch society and the ‘boundaries of the Netherlands’.
The current search engine, although allowing for free full text search, generates only high level search results. As such the portal so far ‘only’ provides a series of linked online resources. It lacks analytic tools to show interconnections, trends, geographical maps and time lines, etc. This research project aims to strengthen the value of this portal and comparable biographical datasets for historical research, by improving the search options and the presentation of its outcomes, starting from the Simple Event Model. This will enable users to trace different perspectives on connections between events, based on (distant or close) connections between persons, places and time. Besides it will create connections between biographical information and museum objects, especially portraits: both in terms of who is being depicted and of who made and/or owned the work. The demonstrator will add a semantic layer on to the current Biography Portal. The pilot will focus on a qualitative selection of links, relevant to the National Portrait Gallery that is being developed by Rijksmuseum. The series of governors-general portraits at the Rijksmuseum provide a good starting point for this pilot on the networks of these people, their involvement in various events, and their role in the creation of Dutch society.
We build on the basis provided by the project Agora and on the VU-CAMeRA project Semantics of History. From this we derive a clear view on the requirements for representing events on techniques for discovering events and on a shred (computational, historical) frame of reference for interpreting historical events. For text mining, we re-use the framework developed in the Asian-European project KYOTO for mining events from text using text properties and semantic resources, across different languages.
Cornetto-LMF-RDF: (Curated Cornetto database in LMF and RDF): a CLARIN project (2012-2013)
Project coordinator of Cornetto-LMF-RDF, which is a combined curation and demonstrator project in which the Dutch Cornetto database is converted to LMF and RDF and made available on a CLARIN Centre for efficient querying. As a semantic resource in which words and concepts are interlinked within the data and to other databases (e.g. wordnets in other languages and ontologies) this project will address many issues on the representation of meaning and user-queries to these data, such as the complex data structure (semantic and structural) and semantic linkage, such as hypernym chains of concepts or semantic typing of words. The project will combine a new release of Cornetto (version 2) with the data from DutchSemCor (a semantic annotation of text corpora) and a Dutch sentiment lexicon. The results are presented in LMF and the wordnet part also in RDF and SKOS. This bridges the standardization and metadata requirements of ISO and W3C.
SIERA: ("Integrating Sina Institute into the European Research Area"): 7th EU Framework project (2012-2014)
Associate partner of SIERA wich aims to reinforce closer and sustainable scientific cooperation between Palestinian and EU scientists in the field of multilingual and multicultural knowledge sharing technologies. This objective is attained through integrating BZU Sina Institute, which is the largest ICT research centre in Palestine and among a few in the Arab world in this field, into the European Research Area. Two EU multilingual knowledge sharing portals (which were developed in previous FP7 and eTen projects) have been selected as a concrete testbed for establishing scientific collaboration and integration. The first, MICHAEL, is a cultural heritage portal which provides a multilingual service to explore digital collections from museums, archives, libraries and other cultural institutions from across Europe. The second, KYOTO (with Vossen as project coordinator), is a wiki-portal about environment and ecology. The key idea is to use them to investigate how to enable large-scale knowledge sharing portals with Arabic language and content. Both portals already support multilingual knowledge sharing. Extending such portals to support Arabic content and semantic search is a challenging task due to the complexity of the language and as the Arabic content needs to be semantically interlinked with EU content. MICHAEL and KYOTO portals were selected carefully not only because their application domains are important areas of interest for EU and Arab societies and markets, but because extending them with Arabic is a good case, from a scientific viewpoint to set up a joint research and cooperation, exchange knowledge, and tune in-house methodologies and tools concretely.
DutchSemCor:("Dutch corpus with word senses from the Cornetto database): NWO-project 380-70-011 (2009-2012)
Project coordinator of DutchSemCor: which aims to deliver a one-million word Dutch corpus that is fully sense-tagged with senses and domain tags from the Cornetto database (STEVIN project STE05039). 250K words of this corpus will be manually tagged. The remainder will be automatically tagged using three different word-sense-disambiguation systems (WSD), and will be validated by human annotators. The corpus data will be based on existing corpus material collected in the projects CGN, D-CoI and SoNaR. These corpora have already been automatically annotated with morpho-syntactic tags and structures. The corpora will be extended where necessary to find sufficient examples for meanings of words that are less frequent and do not appear in the above corpora. The resulting corpus, for which we aim to offer the same balance in types of text as these basic resources, will be extremely rich in terms of lexical semantic information. Its
availability will enable many new lines of research and technology developments for the Dutch language. In particular, it will enable
research into the relation between language form and language interpretation, and as such it will be applicable in the fields of
cognitive science, (psycho-)linguistics, language learning and language teaching, semantic web applications, information retrieval, machine translation, text mining, and document interpretation (summarization, topic segmentation). We foresee that the corpus will create new directions of research and technology development on a par with current developments for English. (news VUA).
-
The research project Text2Politics combines contemporary theories and methods in linguistics and political science to develop an automated research tool for rich text-mining. The transdisciplinary relevance of the project is that a carefully constructed mining tool for language-meaning research can be applied to enhance the Kieskompas (Electoral Compass) and prove useful in the social sciences in general. The research will give new insights into the complexity of language use, the linguistic modeling of subjectivity and the representation of this knowledge in a lexicon. It will also shed new light on the complex dimensionality of competition between political parties. The work is carried out by three AIOs that are situated at the Faculty of Social Sciences Sciences and the Faculty of Arts and is funded by the Interfaculty research institute CAMeRA.
-
The research project Semantics of History develops a historical ontology and a lexicon that are used in a new type of information system that can handle the time-based dynamics and varying perspectives in historical archives. The system will integrate new insights in the ontological and linguistic analysis of the data that will follow from empirical and fundamental research. The work is carried out by two AIOs that are situated at the Faculty of Exact Science and the Faculty of Arts and is funded by the Interfaculty research institute CAMeRA.
History is typically a record of different realities in time and specifically focuses on the changes in reality. Even stronger, the
perception of history can be different for different participants and for different cultural and linguistic groups. Finally, the reflection on the past can be different based on our different views: history has been and will be re-written many times. Information systems of historical archives should handle the dynamicity in time and represent all realities at an equal level while at the same time they should define the relations, the invariables and changes across the realities. The units of change are events and typically in history events can be organized at different levels of change. The most constant elements are locations, people and dates but nevertheless many different structures are still possible, which need to be related relative to these more constant elements. Such a system should also allow users to classify and structure reality from any possible perspective when accessing the archives.
Vast amounts of historical data are available as free text. The text itself can be related in time just as the events. For direct reporting and communication in the same time-frame there will be little distance between the communication date and the event date. Historical documents on the other hand have a large distance between reporting and event date. We also expect that the linguistic expression for naming these events will be different; exhibiting high abstraction and others types of perspectives in historical reports as compared to actual news reports. A historical information system requires an innovative view on the semantics of events and the ways we can conceptualize these through language in different genres of documents.
Global WordNet Grid
: a GWA Project (2006-ongoing)
In 2006 Vossen launched the Global Wordnet Grid: the building of a complete free worldwide wordnet grid. This grid will be build around a shared set of concepts, such as the Common Base Concepts used in many wordnet projects. These concepts will be expressed in terms of Wordnet synsets and SUMO definitions. People from all language communities are invited to upload synsets from their language to the Grid. Gradually, the Grid will then be represented by all languages. The Grid will be available to everybody and will be distributed completely free.
-
Global Wordnet Association
(2000-ongoing) Vossen is Founder and President of the Global WordNet Association. He founded GWA (with Christiane Fellbaum of Princeton University) in 2000 as a public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. For more information see:
- Global Wordnet Association
- Sixth Global WordNet Conference 2012 in Matsue, Japan, January 9-13, 2012
- Fifth Global WordNet Conference 2010 in Mumbai, India, January 31 - February 4, 2010
- Fourth Global WordNet Conference 2008 in Szeged, Hungary, January 22-25, 2008
- Third Global Wordnet Conference 2006 in Jeju Island. Korea, January 22-26, 2006
- Second Global Wordnet Conference 2004 in Brno, Czech Republic, January 20-23, 2004
- First Global Wordnet Conference 2002 in Mysore, India, January 21-25, 2002