Learn R Programming

x.ent (version 1.1.6)

xconfig:

Description

This function allows users to configure the entire system, such as: paths for corpus, evaluation file, result file, dictionaries ...

Usage

xconfig(json_path="")

Arguments

json_path
path of configuration file (*.json)

Details

System configuration x.ent uses a file json to configure the entire system. Configuration file structure is very complex and has multiple entries. Easy for the user to manage, we create a web-based interface and use javascript in client-side, code R in the server-side for updating data in the configuration file. The entries in the configuration file:
corpus
A path to the directory containing the corpus (text or xml)
eval
A path to the evaluation file
result
A path to the file that will store the results
dico
contain information of a list of dictionary, each dictionary has the following format: the original word and the transformations this word: singular, plural, unaccented word, synonym and acronym, for example with a dictionary of plants:
wheat:N:Wheat:WHEAT:Wheats:Triticum:Durum wheat:Common wheat: durum wheat:L:DURUM:T. durum:Triticum durum:Triticum turgidum:durums wheats:durum wheat:macaroni wheat:
The letter N(node) indicates that this category (wheat) may have subcategories (Durum wheat, Common wheat, ...).
The letter L (leaf) indicates a leaf of a node. wheat:N:Wheat:WHEAT:Wheats:Triticum:Durum wheat:Common wheat:
In this entry, we have to configure the following information:
- tag: a name used to mark results, ex: p for plant, m for disease
- file: a path to a dictionary file - node: if the dictionary contains nodes (N)
- col_key: the column in the dictionary that contains the original word
- col_val: the columns in the dictionary that we want to use to search in the corpus - tag: a name used to mark results, ex: p for plant, m for disease
unitex
Unitex, this is a tool that allows you to build grammar and you will extract the data from the grammar that you have built. If you want to use this feature, you can download http://www-igm.univ-mlv.fr/~unitex/index.php?page=3&html=download2.html. In this entry, we have to configure the following information: - system
  1. tool_unitex: a specified full path to the tool Unitex, the name of the tool is "UnitexToolLogger", you can find in the installation directory.
  2. main_graph: a grammar that you have built, it's like a graph (in Unitex). In your application, you can have many graphs. So you have to use a main graph to link all the sub-graphs.
  3. my_unitex: this is a place that stores local data of Unitex
  4. dico: a list of dictionaries of Unitex

- result

  1. tag: a name used to mark results
  2. tag_unitex: a tag used tp mark in Unitex
  3. get: number of results that we want to get: 1, 2, ..or all from the first position of document

relation
You can create the relations beetween entities, such as: the relation beetween plants and diseases. This is the information that you have to configure:
  1. type: there are two options: structure (relation extraction in the following doccument structure) and cooccurrence.
  2. left, right: these parramaters are used in the cooccurrence mode, we setup a window from the left and the right of root entity.
  3. root: root of relation, ex: p for plant
  4. negative: an entity is used to identify whether the relation is negative or positive
  5. link: details of the relation, ex: plant:disease => p:m.

avoid
In the document, maybe you don't want to find in a few paragraphs, so you can use this feature. You can create a file according to the following format: key word..phrase or end. - phrase: beginning from key word to the end of the paragraph - end: beginning from key word to the end of file
replace
a path of file that contains words to be replaced. The format: - words_will_be_replaced:list_words_need_replacing
stopword
a path of file that contains the list of stop words
blacklist
a path of file that contains the list of words for each entity that we do not want to appear in results

Examples

Run this code
  xconfig()
  xconfig("C:/JSON/ini.json")

Run the code above in your browser using DataLab