Linguistics as science
· vs philology or engineering
· systematic
· exhaustive
· theory-based/directed
· how much science do we need?
Questions for linguistics
· what is universal?
· what is innate?
· how can the non-innate parts be learned?
· what are the units of language?
· how is language capacity organized into components?
· how can we represent each component?
· how does language relate to meaning and thought?
Components
· phonetics: vocal tract and spectra
· phonology: sound systems for language
· morphology: roots/stems and affixes
· syntax: what can be said
· semantics: looking up and computing meaning
· discourse: beyond single sentences
· pragmatics: how we use language
Data for linguistics
· to the scientist/linguist
· to the child (little linguist)
· for machine learning of language
· words, phrases, sentences or beyond
· marginal, complex sentences
· frequencies
· meaning-utterance pairs: caregiverese
· psycholinguistic experiments
Psycholinguistics
· performance: is it just noise wrt/ competence?
· understanding tasks; e.g., attachment
· acquisition: sequences of corpora
· elicited production
· relation to general cognition
· errors in L1 and L2 learning
· L2 tests and MT evaluation
Modifiers of “linguistics”
· psycho (see above)
· socio- : class, subculture
· politics of language and dialect
· historical: what makes language evolve
· comparative
Technical modifiers of “linguistics”
· formal
· statistical
· computational
· and what are NLP and HLT?
Applying language as input / output / knowledge
· input: database front end (restaurants, airlines)
· output: report generation from data (weather)
· both: machine translation (web, repair manuals)
· knowledge: information extraction
Issues of input / output / knowledge
· i/o reversible, as in Prolog(?)
· is NLG easier than NLU; if so, why?
· habitability (NL unlimited as input)
· limited KRs of apps (helps both ways)
Text versus speech
· separate history
· representation: FST vs (augmented or transformed) CFG
· independent and (logically) sequential(?)
·
text-only applications: handicap aids, voice i/o
Probability and Statistics
· now widely incorporated into symbolic systems
· hidden Markov models for speech recognition
· Bayesian spell-checking
· probabilistic parsers and CFGs