FreeLing  3.0
Public Member Functions | Private Member Functions | Private Attributes
probabilities Class Reference

Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence. More...

#include <probabilities.h>

Inheritance diagram for probabilities:
Inheritance graph
[legend]
Collaboration diagram for probabilities:
Collaboration graph
[legend]

List of all members.

Public Member Functions

 probabilities (const std::wstring &, const std::wstring &, double)
 Constructor.
void annotate_word (word &)
 Assign probabilities for each analysis of given word.
void set_activate_guesser (bool)
 Turn guesser on/of.
void analyze (sentence &)
 Assign probabilities to tags for each word in sentence.
void analyze (std::list< sentence > &)
 Assign probabilities to tags for each word in sentences.
sentence analyze (const sentence &)
 Assign probabilities to tags for each word in sentence, return copy.
std::list< sentenceanalyze (const std::list< sentence > &)
 Assign probabilities to tags for each word in sentences, return copy.

Private Member Functions

void smoothing (word &)
 Smooth probabilities for the analysis of given word.
double compute_probability (const std::wstring &, double, const std::wstring &)
 Compute p(tag|suffix) using recursively shorter suffixes.
double guesser (word &, double)
 Guess possible tags, keeping some mass for previously assigned tags.

Private Attributes

boost::u32regex RE_PunctNum
 Auxiliary regexps.
double ProbabilityThreshold
 Probability threshold for unknown words tags.
std::wstring Language
double BiassSuffixes
 Interpolation factor to favor suffix probabilities versus ambiguity-class probabilities when smoothing known but unobserved words.
double LidstoneLambda
 lambda parameter for smoothing via Lidstone's Law
bool activate_guesser
 whether to use guesser for unknown words.
std::map< std::wstring, double > single_tags
 unigram probabilities
std::map< std::wstring,
std::map< std::wstring, double > > 
class_tags
 probabilities for usual ambiguity classes
std::map< std::wstring,
std::map< std::wstring, double > > 
lexical_tags
 lexical probabilities for frequent words
std::map< std::wstring, double > unk_tags
 list of tags and probabilities to assign to unknown words
std::map< std::wstring,
std::map< std::wstring, double > > 
unk_suffs
 list of tag frequencies for unknown word suffixes
double theeta
 unknown words suffix smoothing parameter;
std::wstring::size_type long_suff
 length of longest suffix

Detailed Description

Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence.


Constructor & Destructor Documentation

probabilities::probabilities ( const std::wstring &  Lang,
const std::wstring &  probFile,
double  Threshold 
)

Constructor.

Create a probability assignation module, loading appropriate file.

References ERROR_CRASH, util::open_utf8_file(), RE_FZ, TRACE, and util::wstring2double().


Member Function Documentation

void probabilities::analyze ( sentence se) [virtual]

Assign probabilities to tags for each word in sentence.

Annotate probabilities for each analysis of each word in given sentence, using given options.

Implements processor.

References TRACE_SENTENCE.

Referenced by maco::analyze().

void probabilities::analyze ( std::list< sentence > &  )

Assign probabilities to tags for each word in sentences.

Reimplemented from processor.

Assign probabilities to tags for each word in sentence, return copy.

Add probabilities to words in given sentence, return copy.

Reimplemented from processor.

References processor::analyze().

std::list<sentence> probabilities::analyze ( const std::list< sentence > &  )

Assign probabilities to tags for each word in sentences, return copy.

Reimplemented from processor.

Assign probabilities for each analysis of given word.

Annotate probabilities for each analysis of given word.

References word::find_tag_match(), word::found_in_dict(), word::get_form(), word::get_n_analysis(), word::has_retokenizable(), word::select_all_analysis(), and TRACE.

double probabilities::compute_probability ( const std::wstring &  tag,
double  prob,
const std::wstring &  s 
) [private]

Compute p(tag|suffix) using recursively shorter suffixes.

Compute probability of a tag given a word suffix.

References util::double2wstring(), and TRACE.

double probabilities::guesser ( word w,
double  mass 
) [private]

Turn guesser on/of.

Turn guesser on/off.

void probabilities::smoothing ( word w) [private]

Smooth probabilities for the analysis of given word.

if using backoff, combine with suffix information to get better estimation

References word::get_form(), word::get_lc_form(), word::get_n_analysis(), TRACE, and WARNING.


Member Data Documentation

whether to use guesser for unknown words.

double probabilities::BiassSuffixes [private]

Interpolation factor to favor suffix probabilities versus ambiguity-class probabilities when smoothing known but unobserved words.

std::map<std::wstring,std::map<std::wstring,double> > probabilities::class_tags [private]

probabilities for usual ambiguity classes

std::wstring probabilities::Language [private]
std::map<std::wstring,std::map<std::wstring,double> > probabilities::lexical_tags [private]

lexical probabilities for frequent words

lambda parameter for smoothing via Lidstone's Law

std::wstring::size_type probabilities::long_suff [private]

length of longest suffix

Probability threshold for unknown words tags.

boost::u32regex probabilities::RE_PunctNum [private]

Auxiliary regexps.

std::map<std::wstring,double> probabilities::single_tags [private]

unigram probabilities

double probabilities::theeta [private]

unknown words suffix smoothing parameter;

std::map<std::wstring,std::map<std::wstring,double> > probabilities::unk_suffs [private]

list of tag frequencies for unknown word suffixes

std::map<std::wstring,double> probabilities::unk_tags [private]

list of tags and probabilities to assign to unknown words


The documentation for this class was generated from the following files: