|
FreeLing
3.0
|
Class "lang_ident" checks a text against all known languages and sorts the results by probability. More...
#include <lang_ident.h>
Public Member Functions | |
| lang_ident (const std::wstring &) | |
| Build a language identifier, read options from given file. | |
| void | add_language (const std::wstring &) |
| load given language from given file, add to existing languages | |
| void | train_language (const std::wstring &, const std::wstring &, const std::wstring &) |
| train a model for a language, store in modelFile, and add it to the known languages list. | |
| std::wstring | identify_language (const std::wstring &, const std::set< std::wstring > &) const |
| Classify the input text and return the code of the best language (or "none") | |
| void | rank_languages (std::vector< std::pair< double, std::wstring > > &, const std::wstring &, const std::set< std::wstring > &) const |
| fill a vector with sorted probabilities for each language | |
Private Member Functions | |
| void | language_probabilities (std::vector< std::pair< double, std::wstring > > &, const std::wstring &, const std::set< std::wstring > &) const |
| fill a vector with unsorted probabilities for each language | |
Private Attributes | |
| std::map< std::wstring, idioma > | idiomes |
| List of known languages . | |
| std::set< std::wstring > | all_known_languages |
| double | Threshold |
| Threshold likelihood to consider a text as belonging to a language. | |
| double | ScaleFactor |
| ScaleFactor to correct likelihood of each language. | |
Class "lang_ident" checks a text against all known languages and sorts the results by probability.
It creates an instance of "idioma" for each known language, and checks input text against all existing instances.
| lang_ident::lang_ident | ( | const std::wstring & | ) |
Build a language identifier, read options from given file.
| void lang_ident::add_language | ( | const std::wstring & | ) |
load given language from given file, add to existing languages
| std::wstring lang_ident::identify_language | ( | const std::wstring & | , |
| const std::set< std::wstring > & | |||
| ) | const |
Classify the input text and return the code of the best language (or "none")
| void lang_ident::language_probabilities | ( | std::vector< std::pair< double, std::wstring > > & | , |
| const std::wstring & | , | ||
| const std::set< std::wstring > & | |||
| ) | const [private] |
fill a vector with unsorted probabilities for each language
| void lang_ident::rank_languages | ( | std::vector< std::pair< double, std::wstring > > & | , |
| const std::wstring & | , | ||
| const std::set< std::wstring > & | |||
| ) | const |
fill a vector with sorted probabilities for each language
| void lang_ident::train_language | ( | const std::wstring & | , |
| const std::wstring & | , | ||
| const std::wstring & | |||
| ) |
train a model for a language, store in modelFile, and add it to the known languages list.
std::set<std::wstring> lang_ident::all_known_languages [private] |
std::map<std::wstring,idioma> lang_ident::idiomes [private] |
List of known languages .
double lang_ident::ScaleFactor [private] |
ScaleFactor to correct likelihood of each language.
double lang_ident::Threshold [private] |
Threshold likelihood to consider a text as belonging to a language.
1.7.6.1