BenefitsSolutionFeaturesLearn More

CompactifAI

AI Model Compressor

CompactifAI

The AI model compressor to make AI systems faster, cheaper and energy efficient.

Have your AI model compressed and benefit from efficient and portable models. Greatly reducing the requirements for memory and diskspace, making AI projects much more affordable to implement.


Want to get started quickly with our API?

Benefits of Using CompactifAI

Cost Savings

Lower your energy bills and reduce hardware expenses.

Privacy

Keep your data safe with localized AI models that don't rely on cloud-based systems.

Speed

Overcome hardware limitations and accelerate your AI-driven projects.

Sustainability

Contribute to a greener planet by cutting down on energy consumption.

Why CompactifAI?

Current AI models face significant inefficiencies, with parameter counts growing exponentially but accuracy only improving linearly.

This imbalance leads to:

Skyrocketing Computing Power Demands

The computational resources required are growing at an unsustainable rate.

Soaring Energy Costs

Increased energy consumption not only impacts the bottom line but also raises environmental concerns.

Limited Chip Supply

The scarcity of advanced chips limits innovation and business growth.

Compactif AI

The Solution

Revolutionizing AI Efficiency and Portability: CompactifAI leverages advanced tensor
networks to compress foundational AI models, including large language models (LLMs).

This innovative approach offers several key benefits:

Enhanced efficiency

Drastically reduces the computational power required for AI operations.

Specialized AI models

Enables the development and deployment of smaller, specialized AI models locally, ensuring efficient and task-specific solutions.

Privacy and Governance Requirements

Supports the development of private and secure environments, crucial to ensure ethical, legal, and safe use of AI technologies.

Portability

Compress the model and put it on any device.

Key Features

Size Reduction

Parameter Reduction

Faster Inference

Faster Retraining

Learn More

Watch the Video

Read the Paper

FAQ

How is CompactifAI sold?

You can now access CompactifAI models in three ways:

1. Via API on AWS – Our compressed and original models are available through our API, now listed on the AWS Marketplace.
2. License for private infrastructure – We provide enterprise licenses to deploy CompactifAI on your own on-premise or cloud environment.
3. Delivery through a service provider – We can compress your model and deliver it to your preferred inference provider or infrastructure partner.

What LLM models does CompactifAI support?

CompactifAI is compatible with commercial and open-source models like Llama 4 Scout, Llama 3.3 70B, DeepSeek R1, Mistral Small 3.1, Microsoft Phi 4, among others. It needs to have access to the model itself to be able to compress it.

OpenAI provides an API to access (query) the model, therefore Multiverse Computing’s product is not able to compress it.

Where can the compressed models run?

One of the advantages of CompactifAI is that the compressed model can run anywhere - it can run on x86 servers on premise if security or governance reasons are a concern, but it can also run on the Cloud, our laptop or any device. You choose.

How does CompactifAI affect the R.A.G? (Retrieval Augmented Generation)

One of the advantages of CompactifAI is that it reduces the resources needed to run RAG and greatly speeds up the inference time.

What are the hardware requirements for the model?

Minimum requirements to run the models stated below. These are not necessarily the requirements needed to make this work on real application. In particular, at inference time the requirements will vary depending on the required latency (response time) and throughput (tokens per second) for the system. The latter is related to the number of simultaneous users you can serve. Consider these requirements as a lower bound; improving latency and throughput would require more powerful GPUs, such as NVIDIA H100 GPUs with 40GB or 80GB of VRAM . [source 1, source 2, source 3]

Training, LLM of 7b at FP16:
GPU: 8 NVIDIA A100 GPUs each with 40GB of VRAM
RAM system: 320 GB
Disk space: 40 GB

Training, LLM of 70b at FP16:
GPU: 32 NVIDIA A100 GPUs each with 40GB of VRAM
RAM system: 1280 GB
Disk space: 200 GB

Inference, LLM of 7b at FP16:
GPU: 1 x NVIDIA A10 GPUs with 24GB of VRAM (or higher models)
RAM system: 16GB
Disk space: 16GB

Inference, LLM of 70b at FP16:
GPU: 8 x NVIDIA A10 GPUs with 24GB of VRAM (or higher models)
RAM system: 64GB
Disk space: 140GB

Who retrains the model once it is compressed?

Customers can retrain the model if they have the platform and resources to do it. Multiverse Computing can also provide this service at a cost to the customer.

Is CompactifAI open source? Do you share CompactifAI on Github?

No. It is not open source. We do not currently share CompactifAI on GitHub.

Can CompactifAI be applied to other large AI architectures: other NLU, ViT (real time video), CNN (image), etc?

Yes. We developed it to compress any linear and convolutional layer used in standard LLMs. If there is a model with a custom layer, we can quickly adopt it in CompactifAI.

Can CompactifAI run for multi-modal models?

It is in our roadmap. We are developing the next version of the compressor which supports multi-modal models.

Ready to transform your AI Capabilities?

Contact us today to learn how CompactifAI can streamline your AI operations and drive your business forward.

Unlocking the Quantum AI

Software Revolution.

CONTACT US

Interested in seeing our Quantum AI softwares in action? Contact us.

Please fill all the required fields!
Please accept terms and conditions to proceed
Please wait