Class "lang_ident" checks a text against all known languages and sorts the results by probability. More...

#include <lang_ident.h>

Public Member Functions
	lang_ident (const std::wstring &)
	Build a language identifier, read options from given file.
void	add_language (const std::wstring &)
	load given language from given file, add to existing languages
void	train_language (const std::wstring &, const std::wstring &, const std::wstring &)
	train a model for a language, store in modelFile, and add it to the known languages list.
std::wstring	identify_language (const std::wstring &, const std::set< std::wstring > &) const
	Classify the input text and return the code of the best language (or "none")
void	rank_languages (std::vector< std::pair< double, std::wstring > > &, const std::wstring &, const std::set< std::wstring > &) const
	fill a vector with sorted probabilities for each language
Private Member Functions
void	language_probabilities (std::vector< std::pair< double, std::wstring > > &, const std::wstring &, const std::set< std::wstring > &) const
	fill a vector with unsorted probabilities for each language
Private Attributes
std::map< std::wstring, idioma >	idiomes
	List of known languages .
std::set< std::wstring >	all_known_languages
double	Threshold
	Threshold likelihood to consider a text as belonging to a language.
double	ScaleFactor
	ScaleFactor to correct likelihood of each language.

Detailed Description

Class "lang_ident" checks a text against all known languages and sorts the results by probability.

It creates an instance of "idioma" for each known language, and checks input text against all existing instances.

Constructor & Destructor Documentation

lang_ident::lang_ident ( const std::wstring & )

Build a language identifier, read options from given file.

void lang_ident::add_language ( const std::wstring & )

load given language from given file, add to existing languages

std::wstring lang_ident::identify_language	(	const std::wstring &	,
		const std::set< std::wstring > &
	)		const

Classify the input text and return the code of the best language (or "none")

void lang_ident::language_probabilities	(	std::vector< std::pair< double, std::wstring > > &	,
		const std::wstring &	,
		const std::set< std::wstring > &
	)		const `[private]`