Hello! This is part 3 in a TBD part series on creating an LLM from scratch! You can see part 2, creating lists, here, and part 4, creating matrices, here.
What is a vector? For our purposes, a vector is an ordered list of numbers. Here’s an example of a 2-dimensional vector:
[5, 7] And here’s an example of a 5-dimensional vector:
[3.5, 4, 2, 1.2, 943.89] We use multi-dimensional vectors to represent things like words and sentences and paragraphs, because the hope is that each vector will allow us to encode multiple independent aspects of a token at once. At a high-level, we’re guessing that the vector will begin to associate different numbers with different attributes: maybe we want the first number to roughly correlate to how dog-like the word is, the second number to correlate to royalty, the third number to correlate with age, etc. For more information, see this paper on word representations in vector space.
...