Sunday, April 26, 2009

Languages and Grammars (formalization)

Hi everyone, today we are going to see some definitions about formal grammars and languages, I'm giving you this because I want you to be able to read my blog without the help of other texts. These definitions are important to formalize a NLP problem or to describe steps for a solution. I really think it is important to understand this way of writing because it is shorter and especially because it is mostly used.


Ok, moving forward, we are going to start with languages, a language L over Σ, L(Σ) is any sub set of W(Σ). (I hope you’ve read the previous post, if you don’t take a look at it in order to understand these symbols)


There are some operations over languages: (sorry, I made it short because I’m not very interested in talking about languages operations)


  • Union: 

  • Intersection: 

  • Difference: 

  • Concatenation 

  • Potencia: 

  • Positive Closure

  • Closure:


The languages are basically the words that we used in a NLP problem, as easier as that!. So, how can we define our language?, Well, you can do it by writing down each of the words (commonly used for short languages) or using grammars. The general idea of grammars is that they express the “way” we can create a word for a particular language. This “way” is what we call Production or Rule, formally, a Rule x ::= y is an order pair (x,y) where x, y belongs to W(Σ) meaning that x can be replaced by y in every word v ϵ W(Σ). For example, consider the production G : a :: = b, and the word “abba”, if we applied the production G to “abba” we get “bbbb” (easy).


So now is time to define formal grammars which is what we are interested today:


A formal grammar G : can be defined as: 


where, 


ΣT is the alphabet of Terminal Symbols (language symbols)

ΣN is the alphabet of Non Terminal Symbols (intermediate symbols used to achieve the word construction with Terminal Symbols) No terminal symbols accepted in this alphabet

S is an Axiom, a Symbol that is the first symbol of any word of the language

P is the set of Rules used to create the words


Let's see an example,


G = ( {q,w},{a,b},{q},P)


P={ a :: = bqb, b :: = qw, b::=q, b::=w  }


which creates something like:


L = { qqq, wqw, qwqqw, .... }


Now that you know what a grammar is (  ), you should know that a man called Chomsky defined four types of formal grammars, please do an effort and read it, I won't do the effort to write  :


Type 0: No restrictions


Type 1: context dependent


Type 2: context independent


Type 3: regular grammar



or 



Now you may be thinking: "Ok, what is used for??",  well let me give you an example, suppose you are working on a problem where you must find patterns in a text. This patterns might be representable by a regular expression, so you might start working on a grammar of type 0 and start removing all the rules up to obtain a regular grammar, after that you can represent that grammar with a regular expression!!!..


I'll promise I'll bring up something more fun next post!!! 

No comments:

Post a Comment