Note №5

Peer-review

‍Overview

‍

Focus is on describing the article review process, which is a combination of two main tasks. The first task is to identify the criteria by which the model will evaluate articles, as well as the criteria for evaluating the performance of the model itself. The second task is to test the results produced by the model.

‍

Several models have now been tested (GPT-4o, Mixtral and OpenBioLlama) and preliminary criteria for reviewing articles have been identified.

‍

Review process and issues

‍

The review process is divided into seven main stages each of which the model can handle with varying percentages of success. Below the list of these stages,criteria and rationale for whether or not a step can or cannot be analyzed.

‍

Introduction and problem statement

Introduction and problem statement
Relevance and novelty

‍

Regarding this point, it can be concluded that it is generally feasible, but some minimal limitations are present. This is due to the fact that the models have a wide range of material from Wikipedia, articles and other sources, which allows them to effectively define the problem statement.

‍

As for relevance, it depends on the particular model. For example, while the GPT-4o model was trained on data up to October 2023, the Mixtral model has information up to about mid-2022. Thus, the accuracy and relevance of a model is highly dependent on its novelty and updates.

‍

Research methods

Research methods
Research design
Description of methods
Ethical considerations

‍

In general, this task is feasible because, as noted earlier, the models have a wide range of materials.

‍

However, with regard to ethical considerations, it is difficult to predict in advance, as they are not present in all articles (and it is not known whether they were present when the models were trained).

‍

Results and interpretation

Results
Analyses and interpretation
Statistical analysis
Interpretation of data

‍

Models can also cope with this task, with the exception of statistics and possibly data interpretation.

‍

All models may find it difficult to solve such problems and produce incorrect answers. In addition, the models do not yet understand graphs and charts well enough, which also makes it difficult to interpret the data. For example, the Biollama and Mixtral models cannot understand graphs at all, and they also do not do well with calculations more complex than, for example, primitive addition.

‍

Regarding data interpretation, the OpenBioLlama model faces a problem due to its small token window (8 thousand), which limits the amount of text that can be fed into the input. For example, feeding even a small article with a prompt, article and response into the input can exceed 9 thousand tokens of text.

‍

This raises the following difficulties: it is impossible to fit an entire article into the model, and when breaking the article into parts and feeding them into the input separately, the overall meaning of the data interpretation is lost, which can lead to an incorrect answer.

‍

There are several solutions to this problem:

You can connect the chat history and pass it to the model each time a piece of the article is loaded again. However, in this case, the amount of history would increase with each new chunk, and at the end, there would be no differenсe than if we just gave the whole article as input.
It is possible to make a brief summary of the chat history and feed it into the input of the model. However, in this case, there is a chance to lose important information, and there is no guarantee that when the general thought of the history is highlighted, only the right data will be included, and all the right data at that. In addition, this option also does not solve model’s tokens overflow problem l, and the maximum that is possible is to feed only the smallest articles to the input with the risk that some of the important information will be lost.

‍

Discussion

Comparison with previous studies

‍

There are two options here:

‍

Feed previous studies as input, but this too takes away a certain amount of tokens that can be loaded into the model
Rely on the completeness of the training of the model itself.

‍

Conclusion and overall evaluation

Conclusions
Overall evaluation
Recommendations for improvement

‍

Most models should be able to cope with this task given the right number of tokens, but if we feed in chunks, then as already mentioned, some important information may be lost.

‍

Presentation and formatting of the article

Presentation of the article
Structure and logic of presentation
Language and style
Illustrative material
Literature

‍

Regarding the illustrative material, as noted earlier, virtually no modern model can handle it effectively, and the Mixtral and OpenBioLlama models are no exception. For the other items, it is assumed that there should be no problems. However, if the article is divided into parts to feed into the model's input, the model will only produce results for the individual parts, without considering the relationship to other parts of the article.

‍

Regarding the literature, it is preferable for the model to be able to fully load the article so that it can assess whether the literature is relevant to the content of the article. In addition, it is worth noting that the newer the model is, the more relevant answers it can provide.

‍

References and citation

Completeness and relevance of the list of references (need more testing)
Citation

‍

To evaluate citations, the article must be loaded in full so that the model can understand the context in which the citation is used and so that it can interpret it correctly.

‍

Conclusion

‍

Based on the above, the following conclusions can be drawn: there are two main challenges faced by models in text processing.

‍

The first one is the limitation on the number of tokens that the model can process at the same time.. The larger the window of tokens, the better the model can understand the context and coherence of the text.

‍

When addressing this point by breaking the text into smaller parts, another problem arises - loss of narrative cohesion. To deal with this issue, the Mixtral model can be used, which has a 32k token window, allowing it to handle longer text than OpenBioLlama.

‍

When solving the first point by breaking the text into smaller parts, another problem arises - loss of narrative coherence. To solve this problem, the Mixtral model can be used, which has a token window of 32k, allowing it to handle longer texts.

Other diaries

Note №6

Embeddings

June 14, 2024

A series of tests with Mixtral prompts revealed that by using special hints like [INST] [/INST], the model produces much better...

Multi-layer graph PrivateAI (Source code)

May 31, 2024

To construct the graph, the shared_word_graph method is used and the following steps are implemented...

Multi-layer graph PrivateAI (Brief overview)

May 31, 2024

Previously, was illustrated the general concept of using graph visualization to provide a more intuitive and convenient way for users to navigate and explore...

Multi-layer graph PrivateAI (Word graph)

May 27, 2024

Word graph is a local graph, which has a selected word in center and top-N words around, most commonly used in articles with selected word. Each node is provided with at least one...

Jobs

Blog

Docs

Note №5

Peer-review

‍Overview

Review process and issues

Conclusion

Other diaries

Note №6

Embeddings

June 14, 2024

Note №4

Multi-layer graph PrivateAI (Source code)

May 31, 2024

Note №3

Multi-layer graph PrivateAI (Brief overview)

May 31, 2024

Note №2

Multi-layer graph PrivateAI (Word graph)

May 27, 2024