alpi is a Perl script that helps users to install locally Alpage software. It can also be used for installing the whole Alpage linguistic processing chain for French. This script is reasonably user-friendly, since it detects and installs software prerequisites (such as non-standard Perl packages).
The Alpage team develops and maintains a full-features linguistic processing chain for French (see our online Demos). This chain relies on DyALog, FRMG, Lefff and SxPipe (see below for a description of these tools and resources).
Set of tools for the automatic construction of efficient parsers from syntactic descriptions. SYNTAX handles several formalisms such as (deterministic and non-deterministic) CFGs, TAGs, LFGs, RCGs,...
ALPAGE Linguistic Workbench provides several modules for setting up and using a linguistic processing chain, in particular for French, including the shallow processing chain SxPipe and the POS-tagger MElt.
MElt is a freely available (LGPL) state-of-the-art sequence labeller aimed at generating morphosyntactic (POS) taggers trained on both annotated corpora and an external lexicons. MElt is provided with a state-of-the-art tagging model for French, as well as tagging models for other languages (English, Spanish, Italian, German). MElt also includes a normalization wrapper aimed at helping processing noisy text, such as user-generated data retrieved on the web (French and English only).
SxLFG is a parser generator for Lexical Functional Grammars (LFG) that relies on SYNTAX.
Atelier pour les LEXiques INformatiques et leur Acquisition (Workbench for electronic lexica and their acquisition) - Development of morphological and syntactic lexica for NLP. Includes tools as well as several lexica: the Lefff (French), the Leffe (Spanish), PolLex (Polish), SkLex (Slovak), DeLex (German), PerLex (Persian), KurLex (Kurmanji Kurdish) and SoraLex (Sorani Kurdish). Two other freely-available lexica have been imported within the Alexina architecture, namely the morphological lexica for Dutch and Italian distributed respectively within the Alpino project and the Morph-it! lexicon.
The WOLF (Wordnet Libre du Français) is a freely-available semantic lexicon (wordnet) for French.
Treebank built on data extracted from French social media (Facebook, Twitter) and French forums (Doctissimo, JeuxVideos.com). The main interest of this corpus is to provide annotated data for texts whose quality range from medium to very noisy.
The corpus contains 3200 French sentences, from Europarl, Est Republicain newspaper, French Wikipedia and European Medicine Agency. Each sentence is annotated for part-of-speech and phrase-structure, following the French Treebank guidelines. The constituency trees were then automatically converted to dependency trees.