Tip:
Highlight text to annotate it
X
Natural Language processing (NLP) is a field of computer science and linguistics
concerned with the interactions between computers and human languages.
Natural-language understanding is sometimes referred to as an Artifial Inelligence-complete problem,
because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it,
Modern NLP algorithms are grounded especially in statistical machine learning.
Research into modern statistical NLP algorithms requires an understanding of a number of disparate fields,
including linguistics, computer science, statistics (particularly Bayesian statistics),
linear algebra and optimization theory.
Prior implementations of language-processing tasks typically involved
the direct hand coding of large sets of rules.
The machine-learning paradigm often uses statistical inference to automatically learn such rules through
the analysis of a set of documents that have been hand-annotated with the correct values to be learned.
This documents are named corpus.
As an example, consider the task of determining the correct part of speech
of each word in a given sentence, typically one that has never been seen before.
A typical machine-learning-based implementation of a part of speech tagger proceeds in two steps
, a training step and an evaluation step.
The training step : makes use of a corpus of training data,
which consists of a large number of sentences, each of which has the correct part of speech attached to each word.
This corpus is analyzed and a learning model is generated from it,
consisting of automatically-created rules for determining the part of speech for a word in a sentence,
typically based on the nature of the word in question,
the nature of surrounding words, and the most likely part of speech for those surrounding words.
The model that is generated has to simultaneously meets two conflicting objectives:
To perform as well as possible on the training data, and to be as simple as possible
,so that the model avoids overfitting the training data.
In The evaluation step, the model that has been learned is used to process new previously unseen data.
It is critical that the data used for testing is not the same as the data used for training;
otherwise, the testing accuracy will be unrealistically high.
Many different classes of machine learning algorithms have been applied to NLP tasks.
Research has focused on statistical models,
which make soft, probabilistic decisions based on attaching real-valued weights to each input feature.
Such models have the advantage that they can express the relative certainty of
many different possible answers rather than only one, producing more reliable results
when such a model is included as a component of a larger system.
In addition, models that make soft decisions are generally more robust when given unfamiliar input,
especially input that contains errors (as is very common for real-world data).
This was only an introduction, If you want to learn more watch next video. Thanks for your time.