2017 Edition > Workshps

Monday 26/06

Introduction to Digital Humanities - Elena Pierazzo

What are the Digital Humanities? Why is it important to know them? What are their purposes? This introductory course will give a theoretical frame to the week of courses.

Course taught in French

XML - Elina Leblanc

The aim of this course is to get the basics of the XML language, which is essential to work with TEI and HTML. This course will provide a set of exercises to practice the XML syntax, but also to discover and understand the software Oxygen XML Editor.

Course taught in French

TEI (base) - Elina Leblanc

Discovery of TEI and its basics through the encoding of several prose texts. This course follows directly the XML course and will provide a concrete application of its rules.

Course taught in French

Introduction to Image Processing – Peter Stokes

This course will provide an introduction to digital images and image processing for people working with books and manuscripts. When working with books, manuscripts and documents today, it is almost inevitable that digital images will be involved: whether for private analysis, to prepare a transcription, or for publication as part of a digital edition or other purpose. In order to get the most out of these images, it is important to understand what they are and how they relate to the original object. In this course, then, we will discuss topics such as spatial and colour resolution; colour calibration; RGB colour; evaluating quality of digital images; and some basic techniques for image enhancement and analysis.

Course taught in French

Tuesday 27/06

TEI Modelisation - Elena Pierazzo

Modelisation is an activity which helps to establish in a formal and manageable way, with a software, a problematic linked to a research project. The course will help the participants to select the most suitable TEI markups for their own research project through the Roma software.

Course taught in French

HTML and CSS - Laura Antonietti

The objective of this course is to give to the students the theoretical and practical basics of the HTML and CSS languages: after a short theoretical introduction, different exercises will help the participants to learn how to create simple Web pages (structure and style).

Course taught in French

Wednesday 28/06

TEI Transcription and edition - Elena Pierazzo -- FULL!!

The course aims to provide a strong introduction to the transcription of primary resources (manuscripts, printed books, other documents) in TEI. From the transcription level, we will shift to the edition level with the markups for the normalization and editorial regularization and the markups for the creation of critical apparatus.

Course taught in French or in Italian (According to the request of the participants).

EpiDoc - TEI for Epigraphy and Inscriptions – Emmanuelle Morlock

This EpiDoc workshop is divided into two half-days. The first half-day will present the main TEI concepts and markups, that are used to transpose in the digital world the traditional epigraphic approaches for transcription, analysis, description and classification of inscriptions. The second half-day will introduce the other constituent elements of the "EpiDoc method", that are headed towards Web publication, interoperability and community exchanges about practices.

Course taught in French.

Linked Open Data with Recogito – Valeria Vitale

This class will introduce Recogito, an online tool developed by Pelagios Commons, to identify and annotate named entities in historical documents, and, in particular, to enable geotagging and georesolution of place references via Linked Open Data. The participants will be walked step by step through the creation of semantic annotations: from the choice of the sources, to the use of automatic recognition; from the disambiguation of the annotations to the different data visualisations options. The students will annotate, via a simple interface, text as well as images and tabular data, both singularly and in simultaneous collaboration. The annotations will be then exported in various standard formats, including CSV, RDF XML, TEI XML and GeoJason, ready to be, potentially, further processed.

Course taught in Italian

GIS - Andreas Nijenhuis-Bescher and Julien Caranton

From the data to the map - GIS for research

The course "from archives to map" aims to show the path from research to its representation. Cartography can translate in a spatial representation data collected in archives, in databases or in literature.

This course reproduce the differents phases from scientific research to the elaboration of a map, through a Geographic Information System (GIS).

Based on a concrete example, the course shows the phases of the elaboration of a database and its representation.

Course taught in French.

Thursday 29/06

NLP - Hervé Blanchon, Laurent Besacier and Gilles Sérasset

Session 1 : Introduction to natural language processing (Hervé Blanchon)

In the first part, I will present a panorama of the applications of natural language processing (analysis, generation, translation, information retrieval, text mining, alignment…), the encountered problems (ambiguity, incompleteness), the different approaches (expert, empirical).

Session 2 : Machine Translation and Analysis (Hervé Blanchon)

In the second part, I will speak in more details about machine translation and the analysis of text, by presenting the methods and tools, and their potential applications. During this course, I will try to present and indicate some available tools for the scientific community.

Session 3 : Lexical Resources (Didier Schwab)

In this course, we will approach several monolingual and multilingual lexical resources, with which we work in our researches. We will speak about their characteristics, the way they are built and their exploitation for different tasks of natural language processing. We will especially study: WordNet, BabelNet, DBNary, distributed representations (Word2vec, Glove, Baroni vectors…).

Session 4 : Fitting under-resourced languages and referencing in-danger languages: two different challenges for speech language engineering (Laurent Besacier)

In this course, I will begin by defining two different concepts: under-resourced and in-danger languages. Under-resourced languages are an important societal and economic challenge: the objective is to provide these languages with tools and resources for natural language processing. I will introduce some contributions of the LIG on this theme in the domain of the development of voice technologies (especially speech recognition). In-danger languages encounter a different problem: the objective is to document and describe languages that are condemned to disappear in a near future, or to contribute to their revival when it is still time. Here, the technologies (speech recognition, mobile applications) can help the field linguist in his work of documentation/description.

Courses taught in French.

Lemmatization and Treebanking (Latin) - Eleonora Litta and Marco Passarotti

Linguistic resources and NLP tools for Latin.

The course aims to provide to the participants the basics for linguistic resources and natural language processing tools for the Latin language.

A short introduction will present the essential concepts and the specialised terminology, and especially the levels of metalinguistic annotations and the different typologies of linguistic resources. In particular, the annotation styles for annotated corpora at a syntactic level will be described. In this context, the course will present two types of resources for Latin: dependency treebanks and lexicons. A short training in the query of treebanks with two different query languages will be considered. For the Latin natural language processing tools, the working of a morphological analyser (Lemlat) and, especially, of a last extension dedicated to morphological derivations, will be presented. Next, the course will focus on the methods and tools (with evaluation) for morpho-syntactic and syntactic analysis, by looking at some main open problems. Finally, some resources and specific tools with their pratical applications will be described.

Course taught in Italian.

RDF and Linked Data – Fabio Ciotti

2-days course

This workshop aims at providing a theoretical introduction and a first hands-on approach to the new methods and tools of semantic representation of information and to knowledge managment. A particular attention will be devoted to their deployment within textual studies. The course includes:
- Semantic web: principles, architectures and languages
- RDF: principles, data model and syntax
- Formal ontologies and OWL; examples of ontologies for social sciences and humanities; a tool for editing onotlogies: Protegé
- XML and TEI as a tool for semantic annotation
- Semantic methods and tools for semantic annotation: Web Annotation Data Model
- Linked data: publishing and querying web-based knowledge bases; elements of SPARQL
Theoretical aspects will be accompained by hans-on sessions where participants will acquire basic operational competences

Course taught in italian

Friday 30/06

XSLT - Elena Pierazzo

XSLT is one of the most used language for the conversion of XML files into HTML Web pages. The course will be composed of an introduction to the transformation models (templates); the basics of xPath; the functions and operations with numbers and strings; the conditional programming; the for-each loops.

Course taught in French or in Italian (according to the request of the participants).

Lemmatization and Treebanking (Ancient Greek) - Francesco Mambrini

The course focuses on the main resources, either published or in development, for the morphosyntactic analysis and annotation of Ancient Greek. In particular, we will illustrate the methods, aims and status of the Ancient Greek Dependency Treebank, the most comprehensive annotated corpus of Greek literary texts from the Archaic and Classical Age. We discuss the outcome of the first experiment in applying Natural Language Processing tools to annotation tasks such as lemmatization, morphological analysis and syntactic parsing. In this respect, we will focus mainly on the open problems, as well as on the most peculiar features of the ancient texts and of Greek language that affect the performances of NLP tools. Finally, we will discuss the potential interaction between the Greek treebanks and other digital resources (such as gazetteers, lexicons, and WordNets) that complement the syntactic annotation with some selected semantic properties.

Online user: 1