|
FreeLing
3.0
|
The class np implements a simple proper noun recognizer. More...
#include <np.h>


Public Member Functions | |
| np (const std::wstring &) | |
| Constructor. | |
Private Member Functions | |
| int | ComputeToken (int, sentence::iterator &, sentence &) |
| Compute the right token code for word j from given state. | |
| void | ResetActions () |
| Reset flag about capitalized noun at sentence start. | |
| void | StateActions (int, int, int, sentence::const_iterator) |
| Perform necessary actions in "state" reached from state "origin" via word j interpreted as code "token": Basically, set flag about capitalized noun at sentence start. | |
| void | SetMultiwordAnalysis (sentence::iterator, int) |
| Set the appropriate lemma and tag for the new multiword. | |
Private Attributes | |
| std::set< std::wstring > | func |
| set of function words | |
| std::set< std::wstring > | punct |
| set of special punctuation tags | |
| std::set< std::wstring > | names |
| set of words to be considered possible NPs at sentence beggining | |
| std::map< std::wstring, int > | ignore_tags |
| set of words/tags to be ignored as NE parts, even if they are capitalized | |
| std::map< std::wstring, int > | ignore_words |
| std::set< std::wstring > | prefixes |
| sets of NE affixes | |
| std::set< std::wstring > | suffixes |
| bool | initialNoun |
| it is a noun at the beggining of the sentence | |
| boost::u32regex | RE_NounAdj |
| boost::u32regex | RE_Closed |
| boost::u32regex | RE_DateNumPunct |
The class np implements a simple proper noun recognizer.
| np::np | ( | const std::wstring & | npFile | ) |
Constructor.
Create a proper noun recognizer.
References ERROR_CRASH, automat::Final, func, ignore_tags, ignore_words, automat::initialState, util::is_capitalized(), MAX_STATES, MAX_TOKENS, names, util::open_utf8_file(), prefixes, punct, RE_CLO, RE_Closed, RE_DateNumPunct, RE_DNP, RE_NA, RE_NounAdj, ST_FUN, ST_IN, ST_NP, ST_PREF, ST_STOP, ST_SUF, automat::stopState, suffixes, TK_mFun, TK_mPref, TK_mSuf, TK_mUpper, TK_sNounUpp, TK_sUnkUpp, TRACE, automat::trans, and WARNING.
| int np::ComputeToken | ( | int | state, |
| sentence::iterator & | j, | ||
| sentence & | se | ||
| ) | [private, virtual] |
Compute the right token code for word j from given state.
Reimplemented from ner_module.
References ignore_tags, ignore_words, punct, and TK_other.
| void np::ResetActions | ( | ) | [private, virtual] |
Reset flag about capitalized noun at sentence start.
Reimplemented from ner_module.
References initialNoun.
| void np::SetMultiwordAnalysis | ( | sentence::iterator | i, |
| int | fstate | ||
| ) | [private, virtual] |
Set the appropriate lemma and tag for the new multiword.
Reimplemented from ner_module.
References initialNoun, and TRACE.
| void np::StateActions | ( | int | origin, |
| int | state, | ||
| int | token, | ||
| sentence::const_iterator | j | ||
| ) | [private, virtual] |
Perform necessary actions in "state" reached from state "origin" via word j interpreted as code "token": Basically, set flag about capitalized noun at sentence start.
Reimplemented from ner_module.
References initialNoun, util::int2wstring(), ST_NP, TK_sNounUpp, and TRACE.
std::map<std::wstring,int> np::ignore_tags [private] |
set of words/tags to be ignored as NE parts, even if they are capitalized
Referenced by ComputeToken(), and np().
std::map<std::wstring,int> np::ignore_words [private] |
Referenced by ComputeToken(), and np().
bool np::initialNoun [private] |
it is a noun at the beggining of the sentence
Referenced by ResetActions(), SetMultiwordAnalysis(), and StateActions().
std::set<std::wstring> np::names [private] |
set of words to be considered possible NPs at sentence beggining
Referenced by np().
std::set<std::wstring> np::prefixes [private] |
sets of NE affixes
Referenced by np().
std::set<std::wstring> np::punct [private] |
set of special punctuation tags
Referenced by ComputeToken(), and np().
boost::u32regex np::RE_Closed [private] |
Referenced by np().
boost::u32regex np::RE_DateNumPunct [private] |
Referenced by np().
boost::u32regex np::RE_NounAdj [private] |
Referenced by np().
std::set<std::wstring> np::suffixes [private] |
Referenced by np().
1.7.6.1