General
1. A series of tests with Mixtral prompts revealed that by using special hints like [INST] [/INST], the model produces much better results compared to their absence.
2. In light of point 1, there is a possibility that our commands will be better understood by the model, and it makes sense to reduce the strictness imposed on the model through the similarity coefficient.
3. Numerous tests were conducted with embeddings from this Embeddings page. Additionally, tests were carried out with embeddings available on Huggingface. Some of the best results were shown by the laBSE embeddings. Although they were primarily designed for text translation in 109 languages, they also proved to be very effective in semantic search. OpenAI embeddings showed a similar level of quality, but unlike LaBSE, they are paid. Thus, it was decided to switch to these embeddings, thereby improving the assistant's performance.
Embeddings are a way to represent words or other data units as vectors in a multidimensional space. Their main goal is to transform discrete objects, such as words or symbols, into continuous vector representations that can be used in various machine learning algorithms. Embeddings enable models to understand and process textual data more effectively.
How Embeddings Work
- Spatial Representation: Each word or object is represented as a point in a multidimensional space. The dimensionality of this space (the number of dimensions) is determined by the developer and usually consists of several hundred dimensions.
- Initialization: At the initial stage of training, embeddings can be initialized with random values or taken from pre-trained models.
- Training: During the training process of a model (e.g., a neural network), embeddings are updated so that semantically similar words have similar vectors. This is achieved through various methods:
Word Co-occurrence: Words that frequently occur near each other in text receive similar vectors.
Context: Words used in similar contexts also receive similar vectors.
- Mathematical Operations: Embeddings allow for mathematical operations with vectors, opening up possibilities for semantic analysis. For example, operations like addition and subtraction of vectors can be used to determine analogies (e.g., "king" - "man" + "woman" ≈ "queen").
Impact of Embeddings
- Context Understanding: Embeddings enable models to understand the context of words, which is especially important for natural language processing (NLP) tasks. This helps models distinguish between words with multiple meanings depending on their surroundings.
- Improving Model Quality: Using embeddings significantly enhances the quality of machine learning models in NLP tasks. They enable more accurate text classification, chatbot response generation, text translation, and more.
- Dimensionality Reduction: Embeddings allow for efficient dimensionality reduction of input data while preserving important information. This makes model training faster and less resource-intensive.
- General Language Model: Trained embeddings can be reused across different tasks and models, allowing for the use of accumulated knowledge about the language.
To demonstrate their work, we will give specific examples.
- A user uploads a file.
- The Langchain (framework for working with AI tools that we use) splits the text into blocks. We use the recursive splitting. The langchain tries to split the text by a list of predefined characters in order until the blocks are small enough. This has the effect of trying to keep all paragraphs together as long as possible.
- The text blocks are then embedded with the usage of the Hugging Face embedding tool. In other words, the text is transformed into vectors with numbers.
- All the text embeddings are then saved in a vector database called Faiss.
- When a user makes a prompt to the assistant, this prompt is embedded in the same way.
- The embedding of the prompt is then compared with the article text embeddings stored in the database. The comparison is performed via the calculation of the L2 distance between the embeddings. In two-dimensional space, it is just the length of a line between two points. Our embeddings have significantly more dimensions, but the general concept is the same.
- We take the 7 text blocks whose L2 distance to prompt embedding is the smallest. In other words, they are very close semantically.
- The assistant takes these blocks and forms the final result based on them. This way, he is supposed to analyze only a few text parts that are close to the prompt as much as possible instead of interpreting the whole text.
- The final answer is shown to the user and we can see it in the demonstration.
Demonstration in the code
Here we connect via the api key to our embeddings, which are located on huggingface.
In this code fragment, you can see the same recursive text splitting, where chunk-size is the size of one part, and chunk_overlap is how many characters one part intersects with another.
And in the code block above, we can see the following. We set the parameters for the search (mmr means that the parts of the text with a lower coefficient are also given attention). In search_kwargs, we pass the number of parts of the text that need to be returned for further processing. After that, we set our prompt (note that the text contains [INST][/INST]). After that, a chain is created from the context, the prompt, the model and the output to the console in the form of text.