These Tokens get converted into some higher dimension vectors which somehow tells the meaning of that token
value of these token got changed according to the sentence. Read Blog to understand in depth...
Then it predicts the next likely word of the sentence
Then this process got repeated multiple times to get the desired result