I tried but failed to adequately explain Automata and Formal Grammar in the comments to this post by Will.
You can live a perfectly satisfactory life without understanding Automata Theory. I don’t completely understand it myself. But I’m a stubborn sort; I hate to fail, so I’m going to try again here, below the fold.
I’ll try to keep it as simple as I can, and use baseball as an example, since that’s something we all understand here. I’ll show how baseball is, by one definition, a language. If you still don’t get it after this, don’t worry, be happy.
Automata
An automaton is an abstract “machine”, which moves from one “state” to another “state”.
The Base/Out situation in baseball is a kind of automaton. It has 24 possible states:
0 out, 0 on
0 out, runner on 1st
0 out, runner on 2nd
0 out, runner on 3rd
0 out, runners on 1st & 2nd
0 out, runners on 1st & 3rd
0 out, runners on 2nd & 3rd
0 out, bases loaded
1 out, 0 on
1 out, runner on 1st
1 out, runner on 2nd
1 out, runner on 3rd
1 out, runners on 1st & 2nd
1 out, runners on 1st & 3rd
1 out, runners on 2nd & 3rd
1 out, bases loaded
2 out, 0 on
2 out, runner on 1st
2 out, runner on 2nd
2 out, runner on 3rd
2 out, runners on 1st & 2nd
2 out, runners on 1st & 3rd
2 out, runners on 2nd & 3rd
2 out, bases loaded
When you start an inning, you begin in this state:
0 out, 0 on
One batter later, there are five possible states you could be in:
0 out, 0 on (batter homered)
0 out, runner on 1st
0 out, runner on 2nd
0 out, runner on 3rd
1 out, 0 on
So that’s a rule of this Base/Out automaton: from the “0 out, 0 on” state, you can only go to one of these five states.
You can’t immediately jump from “0 out, 0 on” to “2 out, bases loaded”. You have to go through intermediate states first.
So you have state transition rules like this:
“0 out, 0 on” => “0 out, 0 on”
“0 out, 0 on” => “0 out, runner on 1st”
“0 out, 0 on” => “0 out, runner on 2nd”
“0 out, 0 on” => “0 out, runner on 3rd”
“0 out, 0 on” => “1 out, 0 on”
You do NOT have rules like this:
“0 out, 0 on” => “2 out, bases loaded”
Suppose you moved from the “0 out, 0 on” state to the “1 out, 0 on” state. From there, you can move on to the following states, and only these states:
1 out, 0 on (batter homered)
1 out, runner on 1st
1 out, runner on 2nd
1 out, runner on 3rd
2 out, 0 on
Chomsky’s linguistics innovation is to use automata theory for natural languages. Let’s explore that.
Just like you started the inning in the “0 out, 0 on” state, you also begin your sentence in a certain state. Suppose you start a sentence with an article, like the word “The”. Let’s call that the “Article state.”
There are rules which govern what kind of “word state” can follow this “Article state”. Some examples:
You can say “The fox”.
You can say “The quick” (as in “the quick fox”)
You can’t say “The the” (music groups aside).
You can’t say “The jumped”
You can’t say “The quickly”
You can’t say “The of”
So English has state transition rules that look like this:
Article state => Noun state
Article state => Adjective state
But no rules like this:
Article state => Article state
Article state => Verb state
Article state => Adverb state
Article state => Preposition state
Just like you can’t directly go from “0 out, 0 on” to “2 out, bases loaded” in baseball, you can’t go directly from the “Article State” to a verb, adverb, preposition, or another article. You have follow it with either an adjective or a noun. From there, the “Adjective state” and the “Noun state” will have their own rules about what states can follow.
Formal grammars
Formal grammars are how scientists express rules for these automata. It’s usually written in a form like this:
S -> AB
which means that S (whatever S is) can be replaced with the sequence “A (whatever A is) followed by B (whatever B is)”.
Let’s look at an example:
NounPhrase -> Article Noun
This means we’ve defined a NounPhrase as an “Article” followed by a “Noun”. What’s an Article?
Article -> {the, a}
We’ve defined an “Article” as one of two words: “the” or “a”. What’s a Noun?
Noun -> {pig, dog, bat, base, pitcher, catcher}
We’ve defined “Noun” as one of these six words.
So with these three grammar rules, we can substitute to create 12 valid NounPhrases:
the pig
the dog
the bat
the base
the pitcher
the catcher
a pig
a dog
a bat
a base
a pitcher
a catcher
Now obviously, natural languages are much more complex than this, but these are the basic building blocks you use to describe any language.
Since you can describe baseball using such formal grammar, the game of baseball is, by this definition, a language. It’s no wonder writers love baseball so.