Piek Vossen (1960) is currently fulltime Professor Computational Lexicology at the Faculty of Arts, department Language, Cognition and Communication (LCC) at the Vrije Universiteit Amsterdam. He is the head of the Computational Lexicology & Terminology Lab (CLTL), Member of the Management Team of CAMeRA (the Interfaculty research institute of Sociology, Computer Science, Clininal Psychology and Language & Communcation) and the Founder and President of the Global WordNet Assocation (GWC).
Vossen has published more than 200 (peer reviewed) articles in national and international journals, conference proceedings, book chapters and (hand)books. He has given invited lectures at several conferences and other occasions, is a regular organisor of and referee for (inter)national conferences and journals, and has served on several program committees and organizing committees of workshops and conferences. He also serves as a member of PhD-committees in the Netherlands as well as abroad and in the last two years, 4 Phd-researchers from abroad obtained funding to become guest researcher at the VU to work with Prof. Vossen.
Piek Vossen studied Dutch and General Linguistics at the University of Amsterdam. In 1995, he received his PhD (cum laude) in Linguistics on Computational Lexicology and Lexicography. He has been involved in the following (EC-)projects: Links, Acquilex-I and II, Sift, EuroWordNet I and II, Meaning, Euroterm, Balkanet and Pidgin, Cornetto, DutchSemCor, Text to Political Positions, Semantics of History and KYOTO.
His PhD thesis, entitled "Grammatical and Conceptual Individuation in the Lexicon" (published by IFOTT, Amsterdam) describes a new model, the so-called Anchored Relational Model, for defining the syntactic and semantic properties of English and Dutch nouns based on differences in their pragmatic use. Using formal computer representations, the semantic and syntactic properties can be correlated in complex ways, making a further distinction between a cognitive level of meaning and a lexical semantic level (See also review in International Journal of Lexicography, 1998; 11: 73 - 79).
From 1986-1998, Vossen has been senior reseacher at the University of Amsterdam where he was responsible for a series of research projects, both national (LINKS) and international (ACQUILEX I & II, SIFT). Vossen was projectcoordinator of the EuroWordNet I and II-project. The aim of the EuroWordNet project was to develop a multilingual database with wordnets for Dutch, Italian, Spanish, French, German, Czech, Estonian and English. In this database, the concepts in the wordnets are interconnected via an Inter-Lingual-Index. The EuroWordNet model is now used by many other groups and projects to build wordnets for their languages and to link them to the database. He was the writer and editor of the book "EuroWordNet: a multilingual database with lexical semantic networks" (Kluwer Academic Publishers, 1998) and wrote several chapters in prominent Handbooks in the fields of Ontologies, Linguistics, Language, Lexicography etc
For many years he combined his academic career with his work in the industry. He worked as Senior Manager at Sail Labs (1999-2001), Antwerp/Belgium: a long-term research laboratory developing language technology of the future. After that he worked as a C.T.O. of Irion Technologies B.V. (2001-2009), where he developed multilingual language technology for many different languages. Main products were: a cross-lingual concept-based search engine, a document classification system, and a question-answer system. All products are available for English, Dutch, German, French, Italian and Spanish.
Since 2000, he is Founder and President (with Christiane Fellbaum) of the Global WordNet Association (GWA) and in this function he organized the First Global Wordnet Conference (India, 2002), the Second Global Wordnet Conference (Czech Republic, 2004), the Third Global Wordnet Conference (Korea, 2006), the Fourth Global Wordnet Conference (Hungary, 2008), the Fifth Global Wordnet Conference (India, 2010), the 6th Global WordNet Conference (Matsue, Japan, 2012) and the 7th Global WordNet Conference (Estonia 2014). Since the start of GWA, wordnets were developed for all European languages, many Asian languages, African languages and even dialects (Welsh) and dead languages (Latin). All wordnets (almost 100) are linked to the inter-lingual-index through the same model.The Global WordNet Association has stimulated the development of wordnets all over the globe, which are all inter-connected though the same database. In addition to supporting and advising many wordnet projects, Vossen has also activily been involved in projects developing wordnets, such as BalkaNet and the development of the Arabic wordnet ("Constructing Arabic Wordnet (AWN) in Parallel with an Ontology" (sponsored by the American government and headed by Princeton University), for which he was responsible for the European part of the project. In February 2006, the idea of the Global Wordnet Grid was launched at the 3rd GWC in Jeju, Korea: the building of a complete free worldwide wordnet grid. This grid will be built around a shared set of concepts, such as the Common Base Concepts used in many wordnet projects. These concepts will be expressed in terms of Wordnet synsets and ontology concepts. People from all language communities are invited to upload synsets from their language to the Grid. Gradually, the Grid will then be represented by all languages. The Grid will be available to everybody and will be distributed completely free.
In April 2006 (fulltime since November 2009), Vossen has been appointed as Professor Computational Lexicology at the Faculty of Arts, department Language, Cognition and Communication (LCC) at the Vrije Universiteit Amsterdam. His research interests are WordNets, Computational Lexicon, Ontologies, Computational Linguistics, Language Technology and Computer-Applications. Research on wordnets and computational lexicons, both within a single language and from a multilingual perspective. Vossen is interested in the relation between lexicons and ontologies, from a theoretical point of view as well as from their usage in computer-applications in which meaning and interpretation play a role. He sees the lexicon as a fundamental resource to anchor meaning and interpretation in useful computer behaviour. Computer behaviour can make use of communicative models and insights from communication science. The organization of the lexicon and the knowledge stored in it need to take that usage as a starting point.
As a professor at the VU University Amsterdam, he initiated a number of projects. From 2006-2009, he was the project coordinator of "Cornetto" (Combinatorial Relational Network for Language Applications), financially supported by the Dutch-Flemish Language Union in the Stevin program. Cornetto combined the Dutch wordnet and the Referentie Bestand Nederlands (a Dutch database with combinatoric information of Dutch word meanings) in a unique resource for Dutch.Cornetto covers 40K entries, including the most generic and central part of the language. The database goes beyond the structure and content of Wordnet and FrameNet.The Cornetto database is available for download: free for non-commercial use and euro 15.000,= for commercial use. A demo is also available.
As of March 2008 Vossen was the project coordinator of the 7th EU Framework project: "Knowledge Yielding Ontologies for Transition-based Organization" (KYOTO) in the area Digital Libraries. KYOTO makes knowledge sharable between communities of people, culture, languages and computers, by assigning meaning to text and giving text to meaning by the development of a cross-lingual and cross-cultural knowledge and information transition system that is applied to the domain of the environment in Dutch, English, Italian, Spanish, Chinese and Japanese. Vossen organized two international workshops for the KYOTO project: In February 2009, the 1st International KYOTO Workshop on "Environmental Knowledge Transition and Exchange" in Artis Amsterdam, the Netherlands and in January 2011 the 2nd International KYOTO Workshop on "Advanced Information Systems for sharing information and Knowledge about the Environment in Gifu, Japan.
As of September 2009, Vossen is coordinating a NWO-funded project "DutchSemCor". The goal of DutchSemCor is to deliver a one-million word Dutch corpus that is fully sense-tagged with senses and domain tags from the Cornetto database (STEVIN project STE05039). The corpus which aims to offer the same balance in types of text as these basic resources, will be extremely rich in terms of lexical semantic information. Its availability will enable many new lines of research and technology developments for the Dutch language. In particular, it will enable research into the relation between language form and language interpretation, and as such it will be applicable in the fields of cognitive science, (psycho-)linguistics, language learning and language teaching, semantic web applications, information retrieval, machine translation, text mining, and document interpretation (summarization, topic segmentation).We foresee that the corpus will create new directions of research and technology development on a par with current developments for English.
In September 2009 two other research projects had been been launched, funded by CAMeRA: the Interfaculty research institute : one project "Semantics of History" and "From sentiments and opinions in texts to positions of political parties" (Text2Politics). The Semantics of History-project develops a historical ontology and a lexicon that are used in a new type of information system that can handle the time-based dynamics and varying perspectives in historical archives. Text2Politics combines contemporary theories and methods in linguistics and political science to develop an automated research tool for rich text-mining. The transdisciplinary relevance of the project is that a carefully constructed mining tool for language-meaning research can be applied to enhance the Kieskompas (Electoral Compass) and prove useful in the social sciences in general. The research will give new insights into the complexity of language use, the linguistic modeling of subjectivity and the representation of this knowledge in a lexicon. It will also shed new light on the complex dimensionality of competition between political parties.
As of May 2012 Vossen is the project coordinator of Cornetto-LMF-RDF, which is a combined curation and demonstrator project in which the Dutch Cornetto database is converted to LMF and RDF and made available on a CLARIN Centre for efficient querying. As a semantic resource in which words and concepts are interlinked within the data and to other databases (e.g. wordnets in other languages and ontologies) this project will address many issues on the representation of meaning and user-queries to these data, such as the complex data structure (semantic and structural) and semantic linkage, such as hypernym chains of concepts or semantic typing of words. The project will combine a new release of Cornetto (version 2) with the data from DutchSemCor (a semantic annotation of text corpora) and a Dutch sentiment lexicon. The results are presented in LMF and the wordnet part also in RDF and SKOS. This bridges the standardization and metadata requirements of ISO and W3C.
In 2012 he is also a projectmember of a EC 7th Framework project SIERA: ("Integrating Sina Institute into the European Research Area") which aims to reinforce closer and sustainable scientific cooperation between Palestinian and EU scientists in the field of multilingual and multicultural knowledge sharing technologies. This objective is attained through integrating BZU Sina Institute, which is the largest ICT research centre in Palestine and among a few in the Arab world in this field, into the European Research Area. Two EU multilingual knowledge sharing portals (which were developed in previous FP7 and eTen projects) have been selected as a concrete testbed for establishing scientific collaboration and integration. The first, MICHAEL, is a cultural heritage portal which provides a multilingual service to explore digital collections from museums, archives, libraries and other cultural institutions from across Europe. The second, KYOTO (with Vossen as project coordinator), is a wiki-portal about environment and ecology. The key idea is to use them to investigate how to enable large-scale knowledge sharing portals with Arabic language and content.
As of July 2012, Vossen is project partner of OpeNER ("Open Polarity Enhanced Named Entity Recognition"), another 7th EU Framework project. OpeNER will focus on Sentiment Analysis (SA) services and proposes the reuse and repurpose of existing lexical resources, Linked Data and the broader Social Internet. OpeNER will focus on ES, NL, FR, IT, DE and EN, and create a generic multilingual graduated sentiment data pool reusing existing language resources (WordNets, Wikipedia) and automatic techniques. The Sentiment Lexicon will supplement popular or proprietary Lexicons. The Lexicon will be expressed in a new mark-up format. OpeNER will also create an online development portal and community to host data, libraries, APIs and services. Task focused on implementing models to ensure the long-term self-sustainability and options for Open Licensing are included. It will provide base qualifying technologies and a means for continued development and extension to other languages and domains, freeing SMEs to concentrate on their efforts providing innovative solutions to meet market needs rather than expensive development of core technologies.
Furthermore as of September 2012, a new VU/UvA eScience-project is funded: BiographyNed ("From the Biography Portal of the Netherlands to a Virtual Society of People Past and Present").The Biography Portal of the Netherlands links a wide variety of Dutch online reference works and data sets, written in different times and from different perspectives, through a limited number of metadata. This project aims to enhance its research potential for historical research on the portal’s ‘virtual community’ of the more than 100.000 Dutch people mentioned in the various linked databases. What historical narrative can be generated from digital access to these dictionaries of biography, lexicons and portrait collections? The lead question for the design of a demonstrator is: how can we generate relationships between peoples and events,geographical movements of and networks between people from the Biography Portal and what do they tell historians about the formation of Dutch society and the ‘boundaries of the Netherlands’. Vossen is also a projectmember in two KNAW-grants for The Network Institute for an Academy Assistants program, called “Social networks: working at the interface of social sciences and computer science”: "Depression" and "SPREAD."
As of 2013, Vossen will be the projectcoordinator of another EC 7th Framework project: NewsReader: a "Recorder of History", which is a computer program that "reads" daily streams of news and stores exactly what happened, where and when in the world and who has been involved. The program uses the same strategy as humans by building up a story and to merge it with information stored previously. The software does not store separate events but a chain of events according to a story-line. Like humans, the program thus removes duplicate information and complements incomplete information in the news while reading. In the end, it maintains a single story-line for the events. Unlike humans, the recorder will not forget any detail, will be able to recall the complete and true story as it was told, know who told what part of the story and what sources contradicted each other. The history recorder can be seen as a new way of indexing and retrieving information that helps decision makers to handle billions of news items in archives and millions of incoming news items every day. Current solutions simply result in long lists of potentially relevant items due to the abundance of information. It is up to the user to sift through these results: removing duplication, putting pieces together and separating correct from incorrect information. Likewise, it is often impossible to make truly well-informed decisions. The history recorder is however able to structure these results according to story lines, where it presents the information as a single and complete history. In addition to organizing news as stories, the recorder also has the capacity to abstract from individual stories and to find trends and patterns. It can for example provide a quantified overview of types of companies that are involved in take-overs, in specific periods or regions and correlate that with changes in management and profits. Since it keeps track of all the original sources of the information, the recorder can also provide insights in how the story was told. This will tell us about the different perspectives of sources on our news of today and of the past.
In addition to the research projects, Vossen is involved in a number of committees, networking projects and standardisation initiatives, such as the Ansi committee for ontology standardisation and the EAGLES/ILSE project. Since May 2009 Vossen is a member of CLARIN-NL which aims to design, construct, validate, and exploit a research infrastructure that is needed to provide a sustainable and persistent eScience working environment for researchers in the Humanities, and Linguistics in particular, who want to make use of language resources and the technology to use these resources for their research. He was one of the members of FLaReNet (Fostering Language Resources Network), another 7th EU Framework Project. The major activities of the Network were to survey, analyse, classify LRs and relevant standards, together with their organisational and economic models, and discuss with major stakeholders and players upon new common strategies for a capillary deployment and use of LRs in real-world products.
In 2007 Vossen was one of the initiators/organizers of IWIC2007, the First International Workshop on Intercultural Collaboration (Kyoto, 2007), whereafter a website was launched called Intercultural Collaboration Gateway, with the aim of gathering information about intercultural collaboration. He is invited as expert consultant for many (EU-) projects such as: "Van Dale Groot Woordenboek der Nederlandse Taal", "Corpus Gesproken Nederlands", "Euroterm" and "Balkanet". He was also a member of the NWO - IMIX (Interactieve Multimodale Informatie Extractie)-program and the STEVIN (Spraak-en Taaltechnologische Essentiele Voorzieningen In het Nederlands)-Program from the Nederlandse Taalunie, NWO, AWI, and the FWO and IWT. Vossen is a member of the Netherlands Graduate School of Linguistics "LOT" (Landelijke Onderzoekschool Taalwetenschap) and in July 2009, Vossen had been invited as a member of the Advisory Board of the Taalbank Nederlands (GTB) of the Institute of Dutch Lexicography (INL) As of 2011, he is an expert consultant for the EC regarding the Framework Calls and he became an invited member of the World Wide Web Consortium (W3C) Ontology-Lexica Community Group, which aims at standardization of lexicon models for ontologies. Furthermore Vossen is also an invited member of the Human Language Technologies (HLT)-Committee, which is responsible for the further development of the 'HLT collabaration between South Africa and the Nederlandse Taalunie’. He is also an invited member of the Koninklijke Hollandsche Maatschappij der Wetenschappen as from 2012.
As of September 2009, Vossen is also a Member of the Management Team in charge of scientific research of CAMeRA: the Interfaculty research institute. The Scientific Advisory Board consists of prominent members from the four (i.e. Sociology, Computer Science, Clininal Psychology and Language & Communcation) participating faculties and research groups. Vossen coordinated the initiation of four interfaculty programs, lead by key persons in the four faculties: e-Health & Care, e-Learning and Entertainment, e-Heritage and e-Language. Currently, four research visions have been developed for each program and researchers from the faculties have been contacted and associated with the groups. Vossen also organized a number of events to bring the CAMeRA PHD students together and let them interact. Finally, he represents an important bridge between the leading faculty of CAMeRA (FSW) and the other faculties. In 2011, he joined The Network Institute, a part of the VU University Amsterdam which carries out three interdisciplinary research programmes focused on themes of prime societal relevance: The Connected World, Web and e-Science, and Centre for Organization Studies.
Upto now his professorship position is focused on research. His chair has only limited staff for education and cannot support a full BA or master. Besides a yearly Lecture Course "Introduction to the Computational Lexicon" for 3rd year BA students and a yearly Lecture "ICT technieken", Vossen recently initiated a new specialization Digital Humanities for BA students at the Faculty of Arts, which will contribute to the formalization and empirical grounding of Humanities as a broad discipline. The interdisciplinary course is intended for linguists, computer science students, historians, literature students and students in social sciences. The proposal has been accepted as part of the newly proposed BA Language and Communication at the VU. As a follow up, he and his team will develop a master or master module. Furthermore, master-classes are being developed on wordnets and terminology.
In 2011 and 2012 Vossen had been invited for a lecture course at LOT Winterschool. The 5 lectures (on EuroWordNet: Vossen e.a., WordNet: Miller e.a., GlobalWordNet: Vossen e.a.) focus on the semantic networks of words and concepts and their application in natural language processing. The course provides background information on wordnets and discuss many issues involved in building wordnets, comparing wordnets and using them in NLP applications.
Download full list of publications of Vossen