Docfy

Every model needs a place to live.

A trained LLM is a file until it has hardware to run on. The size of your model determines the infrastructure you need. Docfy gives you two paths — download your model weights or let us host your model in the EU.

Requirements

The right hardware for the right model.

Larger models require more GPU memory. Smaller models can run on consumer hardware. Choose the model size that fits your use case and your infrastructure budget.

Compact

8B

Fast inference, low latency. Ideal for routing, classification, and high-volume simple tasks.


GPU VRAM
16–24 GB
System RAM
32 GB
Example GPU
RTX 4090

Production

14B

The workhorse. Handles document generation, reports, and most production tasks when properly fine-tuned.


GPU VRAM
24–48 GB
System RAM
64 GB
Example GPU
RTX A6000

Advanced

32B

Higher reasoning capability. Suitable for complex analysis, dataset distillation, and demanding inference tasks.


GPU VRAM
48–80 GB
System RAM
128 GB
Example GPU
RTX PRO 6000

Maximum

70B

Deep regulatory analysis, complex reasoning, edge cases. The highest capability available for on-premise deployment.


GPU VRAM
80–96 GB
System RAM
256 GB+
Example GPU
RTX PRO 6000 96GB

Deployment

Two paths to production.

Option A

Download

Take your trained model weights and deploy them on your own GPU infrastructure. Full ownership, zero dependencies. You manage hardware, we deliver updated weights monthly.


  • Standard model format
  • Deployment documentation included
  • Monthly retraining via secure transfer

Option B

Hosted by Docfy

Your model runs on dedicated GPU servers in our Tier III facility in Riga. Not shared infrastructure — your own allocated hardware with guaranteed performance. We handle operations.


  • Dedicated GPU allocation
  • ISO 27001 certified facility
  • 99.9% uptime SLA
  • Data stays in EU jurisdiction

Live Operations

Your model, running 24/7.

Dedicated GPU servers serving your fine-tuned models in production. Models are trained in Europe, weights are deployed to production — every step governed by a Work Order.

Production Rack — EU
ONLINE
SRV-01 · 2× EPYC
Model Alpha — Compliance
SERVING
SRV-02 · 2× EPYC
Model Beta — Legal
SERVING
SRV-03 · 2× EPYC
Model Gamma — Medical
SERVING
SRV-04 · 2× EPYC
Provisioning…
STANDBY
PWR A
PWR B
NET
4× 3200W Titanium · 10GbE
GPU Utilization · SRV-01
GPU 0 · RTX PRO 6000 96GB87%
GPU 1 · RTX PRO 6000 96GB72%
VRAM Used
152 / 192 GB
Inference
38 tok/s
Uptime
99.97%
Model Delivery Pipeline
Training
Training Lab
TRAINING
Weights
Production
Production DC
SERVING
WO-089Deploy v2.1 weights → SRV-01IN PROGRESS

Facilities

Two locations. Training and production.

European Union

Training Lab

Where models are born. Fine-tuning, experimentation, server assembly, and testing. Dedicated climate-controlled room for GPU hardware. Monthly model updates are trained here before deployment to production.


Function
Training & Assembly
Role
Primary disaster recovery

European Union

Production Datacenter

Where models serve. Production GPU servers colocated in a Tier III facility with enterprise-grade power, cooling, and connectivity. Your LLM runs here 24/7.


Tier
III
Certifications
ISO 27001, 22301, PCI DSS
Connectivity
Dedicated connectivity

Dedicated

GPU infrastructure per client

96

CPU cores per server

Tier III

Datacenter certification

24h

Delivery in Europe