Use of amorphous organic polymer that conducts electricity. making an organic polymer that retains its conductive properties without needing to have an ordered structure so it can self-heal. Self-grow and act as neurons in the human brain for use with Holistic AI.
made with tetrathiafulvalene (TTF). The molecules is made from conjugated rings of sulphur and carbon which allow electrons to delocalize across the structure, making TTF a “voracious π-stacker,”
Use of BEDT-TTF, BEST (=bis(ethylenediseleno)tetrathiafulvalene), and BETS salts (Scheme 1) of a simple organic anion, isethionate (HOC2H4SO3−) to develop future Holistic AI (HAI) systems that can learn like the human brain creating neural networks and ability to self-learn and selfheal.

The future of holistic AI (HAI) is to learn how to accurately interpret content more holistically. This means working in multiple modalities (such as text, speech, and images) at once. For example, recognizing whether a meme is hateful requires consideration.
both the image, and content of the meme will need to be considered by the AI. This will require building multimodal models for AI with augmented and virtual reality devices, so they can recognize the sound of an alarm, for example, and display an alert showing which direction the sound is coming from.
Historically, analysing such different formats of data together — text, images, speech waveforms, and video, each with a distinct architecture — has been extremely challenging for machines.
Over the last couple of years, organisations researching the future of holistic AI (HAI) have produced a slew of research projects, each addressing an important challenge of multimodal perception — from solving a shortage of publicly available data for training, for example, Hateful Memes , to a creating single algorithm for vision, speech, and text, to building foundational models that work across many tasks, to finding the right model parameters.
Today, X-HAL is sharing a summary of some of the research being conducted.

Omnivore: A single model for images, videos, and 3D data
New Omnivore models being developed can operate on image, video, and 3D data using the same parameters — without degrading performance on modality-specific tasks. For example, it can recognize 3D models of some basic objects and some simple videos. This enables radically new capabilities, such as AI systems that can search and detect content in both images and videos. Omnivore has achieved state-of-the-art results on popular recognition tasks from all three modalities, with particularly strong performance on video recognition. This could have a major impact on defense systems, drone videos, and the data and Intelligence of military command and control systems. This includes C2, C4I and CSRC. It’s probably the largest expanding market for Holistic AI (HAI).
FLAVA: A foundational model spanning dozens of multimodal tasks
FLAVA represents a new class of “foundational model” that’s jointly trained to do over 35 tasks across domains, including image recognition, text recognition, and joint text-image tasks. For instance, the FLAVA model can single-handedly describe the content of an image, reason about its text entailment, and answer questions about the image. FLAVA also leads to impressive zero-shot text and image understanding abilities over a range of tasks, such as image classification, image retrieval, and text retrieval.
FLAVA not only improves over prior work that is typically only good at one task but, unlike prior work, it also uses a shared trunk that was pre-trained on openly available public pairs — which will help further advance research. Like Omnivore it promises to have a large impact on defence and future warfare providing better analyzed and more details information from drone reconnaissance videos and advanced information for central command and control to make more informative decisions.
CM3: Generalizing to new multimodal tasks
CM3 is one of the most general open-source multimodal models available today. By training on a large corpus of structured multimodal documents, it can generate completely new images and captions for those images. It can also be used in our setting to infill complete images or larger structured text sections, conditioned on the rest of the document. Using prompts generated in an HTML-like syntax, the exact same CM3 model can generate new images or text, caption images, and disambiguate entities in text.
Traditional approaches to pretraining have focused on mixing the architectural choices (e.g., encoder-decoder) with objective choices (e.g., masking). Our novel approach of “causally masked objective” gets the best of both worlds by introducing a hybrid of causal and masked language models.
Data2vec: The first self-supervised model that achieves SOTA for speech, vision, and text
Research in self-supervised learning today is almost always focused on one modality. In recent breakthrough research data2vec research, we show that the exact same model architecture and self-supervised training procedure can be used to develop state-of-the-art models for recognition of images, speech, and text. Data2vec can be used to train models for speech or natural languages. Data2vec demonstrates that the same self-supervised algorithm can work well in different modalities — and it often outperforms the best existing algorithms.
What’s next for holistic AI (HAI) and multimodal understanding?
Data2vec models are currently trained separately for each of the various modalities. But X-HAL research results from Omnivore, FLAVA, and CM3 suggest that, over the horizon, we may be able to train a single AI model that solves challenging tasks across all the modalities. Such a multimodal model would unlock many new opportunities. For example, it would further enhance our ability to comprehensively understand the content of social media posts in order to recognize hate speech or other harmful content. It could also help us build AR glasses that have a more comprehensive understanding of the world around them, unlocking exciting new applications in the metaverse. The driving factors are likely to be military and defense providing advanced capabilities to support drones, soldier-less warfare, and enhanced central control and command decision making.
As interest in multimodality has grown, at X-HAL holistic AI (HAI) consultants we want researchers to have great tools for quickly building and experimenting with multimodal, multitask models at scale.
You can visit our websites to stay up to date and see the latest papers and blogs we post. Infoai.uk and Kuldeepuk-kohli.com.
Thanks