Last revised: 09/06/2024
PrivateAI addresses application of AI and blockchain technologies to big data inference based on Knowledge Graph use cases, which have been proven an effective tool for data analysis [4, 5, 6, 7]. Data representation in the form of knowledge graphs enables unique business cases: drugs repurposing, clinical trial redesign, drug evaluation, personalized medicine, research hypothesis generation, etc.
To acquire valuable applications of Knowledge Graphs usually significant efforts of a multi-discipline team of experts are required [1, 2]. PrivateAI makes these insights more available to wider audience by providing:
To deliver value to end users, three types of components have to be combined in a Data Pipeline:
Each of these components requires domain expertise, and the task becomes a cross-field enterprise, which is very difficult to fulfill for a single team. So cooperation is the key.
PrivateAI creates a marketplace of data products, where each of these components can be maintained by a team of experts, and provided to the community as a well-positioned product; and and environment which helps maintaining and developing data products with integrated AI tools, data protection, and decentralized provably fair income distribution.
General stages of PrivateAI roadmap are:
I. PrivateAI Demo App completion - Display the demo data product, able to onboard new knowledge, generate knowledge graphs, and enable their exploration.
II. Scaling Business Use Cases - extension of integrated datasets, to enable wider range of business use cases, like drugs repurposing, therapeutic assistance, hypothesis research - to involve expert users in testing and economy evaluation.
III. Marketplace Public Launch - an open marketplace which everyone can join, to represent and trade own data products. Expansion from medico-biological use cases to wider application will come with community growth and further development of AI supporting tools, and use case provision pipelines.
Completion of PrivateAI demo app - a first implementation of user app based on PrivateAI data pipeline, with full scale knowledge graph exploration, systems of knowledge quality assurance, and basic economics.
Features that can be expected, in addition to current version of PrivateAI app:
Integration of Llama-3 based OpenBioLLM by Saama AI Labs that enables advanced AI assistance in biomedical use cases, i.e. interpretation of medical research results.
Upgrade of NER & RE modules of graph acquisition algorithm with NLP library spaCy v3.7.4 for improved text tokenization, and spaCy LLM package v.0.7.1 for built-in NER and RE tasks with the availability of few-shot prompting, caching the processed articles;
App frontend will be upgraded with Common Knowledge Graph with multi-layered representation, and a customizable graph scheme guided by user requests. [more details]
PrivateAI has a comparatively small community of contributors, which causes the knowledge base being short on many topics which could benefit the researchers. Integration of arXiv database search allows users to get results from both PrivateAI knowledge base, and arXiv.org website. That significantly extends the utility of the platform as a source of knowledge. [more details]
With the growth of knowledge base, necessity to more fractured access control appeared. ‘Read-only’ access level introduces an intermediate state when an article is made available to read in the PrivateAI internal reader, which restricts copying, parsing, and export. This gives data owners a possibility to share the article without risk of giving it out. [more details]
Ability to freely upload materials on the platform brings a certain number of misuse and abuse cases. Malicious actors may upload the same materials to abuse the rewarding system, or infringe intellectual property to gain some benefits on the platform. Integration of anti-plagiarism and filtering aims to protect users and the platform from malicious attempts. [more details]
Current deployment on AWS cloud infrastructure imposes certain limitations on platform performance, in particular it limits the input size, and limits deployment of own AI models which are demanding to computational resources. Migration to dedicated infrastructure in Latvia or Germany located data centers will enable us to have a wider input range, and increase overall platform performance.
Bonus points to incentivize users, based on Soligity ERC20 smart contracts, and at first located on one of EVM-compatible blockchains. The Reward system will be used as a prototype of a research group economy, and as a part of a more general economy of the marketplace. Bonus balances will be stored in data maps to be later migrated on PrivateAI's own blockchain. [more details]
Further improvement of PrivateAI's first app, refactoring of UI into modular paradigm, and introduction of new use case modules based on new graph schemes and datasets. Pilot projects of new types of data products - graph schemes, GUI modules, and datasets. First implementation of Data Product’s development and distribution group management.
Features that can be expected:
Introduction of peer-review use cases and economics. Unrestricted upload of articles eases demonstration of platform’s abilities but causes lowering quality of the content. To amend that, and to increase platform credibility, AI models including advanced large-language models, will be applied to automate new articles reviewing to the extent which current technology can provide. Later, this feature will be converted into a Peer-review Assistant to help experts reviewing articles.
Exploration of collaborative graph editing and exploration, with possible use cases of AI-guided conflicts and inconsistencies identification, collective resolution via voting on DAO smart contracts.
Research of possibilities to integrate wide biomedical open databases like ACCT [3], and its graph representation .
Dedicated user interface modules for each of the practical use cases have demonstrated efficiency in high-load and high complexity systems as shown in [1]. It makes PrivateAI very flexible and adaptable to emerging needs and necessity to upgrade applications. Current PrivateAI interface will be refactored into the first set of GUI modules. Following modules were suggested by the CTKG research team [2] and include:
Development of web-interface to facilitate trades among data providers and data consumers. Distribute and test data product development across multiple data consumer groups. Integration with prototype CosmWasm smart contracts. Data storage will be developed in parallel, as a separate service which does not affect the operation of the marketplace.
Pilot data provider groups keep being rewarded for their contribution with bonus points that later will be converted into marketplace coins or other valuable rewards.
Public platform to develop and share Data Products, guided with AI tools and incentivized with blockchain economy.
Features that can be expected:
Development of necessary infrastructure - marketplace web wallets, blockchain explorer, public dashboard - to launch a public testnet, and introduce data products navigation, and access. This will be integration tests of the whole ecosystem before going into production and integrating into the real economy.
Next step is generalization of data products management contracts logic to formulate common marketplace policy, and implement it as a blockchain built on Cosmos SDK framework. On that stage we develop PrivateAI PSN nodes, which create the first layer of the chain responsible for asset transfers and enforcement of smart contracts.
Output is validator nodes, network consensus adjustments to implement marketplace general policies, and a common transactional layer for trades among data providers and data consumers, and votes inside a data provider group. [more details]
Community-maintained clusters of CosmosSDK nodes that provide access to the platform via RestAPI web serverswebservers.
Refactoring of data storage and use cases to comply with international and local standards.
The roadmap is under active development and is subject to change without prior notice.
1. Ng, D. Standard Chartered: Threat Intelligence Using Knowledge Graphs, https://neo4j.com/blog/standard-chartered-threat-intelligence-using-knowledge-graphs/
2. Chen, Z., Peng, B., Ioannidis, V.N. et al. A knowledge graph of clinical trials (CTKG). Sci Rep 12, 4724 (2022). https://doi.org/10.1038/s41598-022-08454-z, https://www.nature.com/articles/s41598-022-08454-z#Sec1
3. https://aact.ctti-clinicaltrials.org/ - clinical trials open database
4. Pujara, J., Miao, H., Getoor, L. & Cohen, W. Knowledge graph identification. In International Semantic Web Conference (ISWC) 542–557 (Springer, 2013).
5. Ma, Y., Crook, P. A., Sarikaya, R. & Fosler-Lussier, E. Knowledge graph inference for spoken dialog systems. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5346–5350 (IEEE, 2015).
6. Ji, S., Pan, S., Cambria, E., Marttinen, P. & Yu, P. S. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. 33, 494–514. https://doi.org/10.1109/TNNLS.2021.3070843 (2022).
7. Ioannidis, V. N. et al. DRKG—Drug Repurposing Knowledge Graph for Covid-19. https://github.com/gnn4dr/DRKG/ (2020).