Every model needs a place to live.
A trained LLM is a file until it has hardware to run on. The size of your model determines the infrastructure you need. Docfy gives you two paths — download your model weights or let us host your model in the EU.
Requirements
The right hardware for the right model.
Larger models require more GPU memory. Smaller models can run on consumer hardware. Choose the model size that fits your use case and your infrastructure budget.
Compact
8B
Fast inference, low latency. Ideal for routing, classification, and high-volume simple tasks.
- GPU VRAM
- 16–24 GB
- System RAM
- 32 GB
- Example GPU
- RTX 4090
Production
14B
The workhorse. Handles document generation, reports, and most production tasks when properly fine-tuned.
- GPU VRAM
- 24–48 GB
- System RAM
- 64 GB
- Example GPU
- RTX A6000
Advanced
32B
Higher reasoning capability. Suitable for complex analysis, dataset distillation, and demanding inference tasks.
- GPU VRAM
- 48–80 GB
- System RAM
- 128 GB
- Example GPU
- RTX PRO 6000
Maximum
70B
Deep regulatory analysis, complex reasoning, edge cases. The highest capability available for on-premise deployment.
- GPU VRAM
- 80–96 GB
- System RAM
- 256 GB+
- Example GPU
- RTX PRO 6000 96GB
Deployment
Two paths to production.
Option A
Download
Take your trained model weights and deploy them on your own GPU infrastructure. Full ownership, zero dependencies. You manage hardware, we deliver updated weights monthly.
- Standard model format
- Deployment documentation included
- Monthly retraining via secure transfer
Option B
Hosted by Docfy
Your model runs on dedicated GPU servers in our Tier III facility in Riga. Not shared infrastructure — your own allocated hardware with guaranteed performance. We handle operations.
- Dedicated GPU allocation
- ISO 27001 certified facility
- 99.9% uptime SLA
- Data stays in EU jurisdiction
Live Operations
Your model, running 24/7.
Dedicated GPU servers serving your fine-tuned models in production. Models are trained in Europe, weights are deployed to production — every step governed by a Work Order.
Facilities
Two locations. Training and production.
European Union
Training Lab
Where models are born. Fine-tuning, experimentation, server assembly, and testing. Dedicated climate-controlled room for GPU hardware. Monthly model updates are trained here before deployment to production.
- Function
- Training & Assembly
- Role
- Primary disaster recovery
European Union
Production Datacenter
Where models serve. Production GPU servers colocated in a Tier III facility with enterprise-grade power, cooling, and connectivity. Your LLM runs here 24/7.
- Tier
- III
- Certifications
- ISO 27001, 22301, PCI DSS
- Connectivity
- Dedicated connectivity
Dedicated
GPU infrastructure per client
96
CPU cores per server
Tier III
Datacenter certification
24h
Delivery in Europe
