Every model needs a place to live.

A trained LLM is a file until it has hardware to run on. The size of your model determines the infrastructure you need. Docfy gives you two paths — download your model weights or let us host your model in the EU.

Requirements

The right hardware for the right model.

Larger models require more GPU memory. Smaller models can run on consumer hardware. Choose the model size that fits your use case and your infrastructure budget.

Compact

Fast inference, low latency. Ideal for routing, classification, and high-volume simple tasks.

GPU VRAM: 16–24 GB
System RAM: 32 GB
Example GPU: RTX 4090

Production

14B

The workhorse. Handles document generation, reports, and most production tasks when properly fine-tuned.

GPU VRAM: 24–48 GB
System RAM: 64 GB
Example GPU: RTX A6000

Advanced

32B

Higher reasoning capability. Suitable for complex analysis, dataset distillation, and demanding inference tasks.

GPU VRAM: 48–80 GB
System RAM: 128 GB
Example GPU: RTX PRO 6000

Maximum

70B

Deep regulatory analysis, complex reasoning, edge cases. The highest capability available for on-premise deployment.

GPU VRAM: 80–96 GB
System RAM: 256 GB+
Example GPU: RTX PRO 6000 96GB

Deployment

Two paths to production.

Option A

Download

Take your trained model weights and deploy them on your own GPU infrastructure. Full ownership, zero dependencies. You manage hardware, we deliver updated weights monthly.

Standard model format
Deployment documentation included
Monthly retraining via secure transfer

Option B

Hosted by Docfy

Your model runs on dedicated GPU servers in our Tier III facility in Riga. Not shared infrastructure — your own allocated hardware with guaranteed performance. We handle operations.

Dedicated GPU allocation
ISO 27001 certified facility
99.9% uptime SLA
Data stays in EU jurisdiction

Live Operations

Your model, running 24/7.

Dedicated GPU servers serving your fine-tuned models in production. Models are trained in Europe, weights are deployed to production — every step governed by a Work Order.

Production Rack — EU

ONLINE

SRV-01 · 2× EPYC

Model Alpha — Compliance

SERVING

SRV-02 · 2× EPYC

Model Beta — Legal

SERVING

SRV-03 · 2× EPYC

Model Gamma — Medical

SERVING

SRV-04 · 2× EPYC

Provisioning…

STANDBY

PWR A

PWR B

NET

4× 3200W Titanium · 10GbE

GPU Utilization · SRV-01

GPU 0 · RTX PRO 6000 96GB87%

GPU 1 · RTX PRO 6000 96GB72%

VRAM Used

152 / 192 GB

Inference

38 tok/s

Uptime

99.97%

Model Delivery Pipeline

Training

Training Lab

TRAINING

→

Weights

Production

Production DC

SERVING

WO-089Deploy v2.1 weights → SRV-01IN PROGRESS

Facilities

Two locations. Training and production.

European Union

Training Lab

Where models are born. Fine-tuning, experimentation, server assembly, and testing. Dedicated climate-controlled room for GPU hardware. Monthly model updates are trained here before deployment to production.

Function: Training & Assembly
Role: Primary disaster recovery

European Union

Production Datacenter

Where models serve. Production GPU servers colocated in a Tier III facility with enterprise-grade power, cooling, and connectivity. Your LLM runs here 24/7.

Tier: III
Certifications: ISO 27001, 22301, PCI DSS
Connectivity: Dedicated connectivity

Discuss your infrastructure

Dedicated

GPU infrastructure per client

CPU cores per server

Tier III

Datacenter certification

24h

Delivery in Europe

Every model needs a place to live.

Requirements

The right hardware for the right model.

Compact

Production

Advanced

Maximum

Deployment

Two paths to production.

Option A

Option B

Live Operations

Your model, running 24/7.

Facilities

Two locations. Training and production.

European Union

European Union

Product

Company

Contact

Legal