Assessment Methodology

How Praxis evaluates AI tools for enterprise procurement

The Wrapper Epidemic

Thousands of AI tools are indistinguishable apps built as thin pass-throughs to upstream LLM APIs with zero proprietary logic. For SMBs, the cost of choosing a wrapper over a sovereign platform is measured in wasted procurement cycles, vendor dependency, and stranded training investment.

Praxis solves this with the Anti-Wrapper Verification Shield — an algorithmic scoring engine that evaluates every tool across six weighted dimensions to produce a Resilience Score (0-100) and assign each tool to a resilience tier.

Resilience Tiers

Every tool is classified into one of five tiers based on its composite Resilience Score. Click any tier to see the qualifying criteria:

Sovereign

Score 80+ — The gold standard

- ▸

Sovereign tools own their core technology stack. They have independent model weights, proprietary algorithms, or custom training pipelines that cannot be replicated by switching upstream providers. They offer real data residency, local execution options, and deep integrations.

What qualifies a tool as Sovereign:

Native IP score 25+ 8+ integrations Verified compliance (SOC2/HIPAA/ISO) Enterprise pricing tier Multi-platform support Data export capabilities

Examples: Datadog, Zapier, Salesforce, HubSpot, Monday.com

Durable

Score 65–79 — Defensible architecture

- ▸

Durable tools have defensible workflows and proprietary data moats that create genuine switching costs. They may rely on some upstream infrastructure but add substantial value through custom logic, integrations, or domain expertise.

What qualifies a tool as Durable:

Strong native IP signal 5+ integrations At least one compliance cert Sustainable pricing model Documented use cases (3+)

Examples: Canva AI, Jasper, Notion AI, Loom

Moderate

Score 50–64 — Adequate but watch closely

- ▸

Moderate tools are functional but have gaps in compliance documentation, limited integrations, or unclear differentiation from competitors. Usable for non-critical workflows, but require active monitoring.

Typical characteristics:

Some native IP 1–4 integrations Limited compliance docs Functional pricing

Fragile

Score 35–49 — Significant dependency risk

- ▸

Fragile tools carry significant upstream dependency. They may function today but face existential risk from API pricing changes, provider policy shifts, or competitive obsolescence. Evaluate alternatives before committing.

Low native IP Minimal integrations No compliance certs Thin-margin pricing Single-ecosystem lock-in

Wrapper

Score <35 — Thin UI, zero data moat

- ▸

Wrappers are thin pass-throughs to upstream LLM APIs (OpenAI, Anthropic, etc.) with zero proprietary logic, no data moat, and no defensible competitive position. Any investment in a wrapper tool is at maximum vendor risk.

No native IP Pure API arbitrage No integrations No compliance Description signals dependency

The Six Scoring Dimensions

Every tool's Resilience Score is computed from six weighted dimensions. The algorithm runs from our complete tool database:

30%

Native IP Signal

Does the tool have fine-tuned models, proprietary algorithms, or custom training pipelines? Or is it a thin UI over someone else's API?

20%

Integration Depth

Native integrations, bidirectional data flow, and ecosystem breadth. Siloed tools with Zapier-only connections score lower.

15%

Compliance Posture

SOC2, GDPR, HIPAA, ISO 27001, FedRAMP — weighted by rigor. No compliance documentation documented = automatic flag.

15%

Pricing Sustainability

Unit economics that imply real R&D investment vs. pure API margin arbitrage. Extremely low pricing triggers wrapper signals.

10%

Data Portability

Export formats, off-ramp quality, and migration feasibility. Tools that lock your data in proprietary formats score lower.

10%

Ecosystem Freedom

Dependency on a single upstream provider, API ToS fragility, and multi-language/platform support.

Grading Scale

Grade	Score Range	Tier	Procurement Guidance
A+ / A / A-	80–100	Sovereign	Strong vendor profile. Standard review sufficient.
B+ / B / B-	65–79	Durable	Good profile. Minor areas for vendor clarification.
C+ / C / C-	50–64	Moderate	Adequate. Request additional documentation on flagged areas.
D	35–49	Fragile	Below standard. Enhanced due diligence required.
F	0–34	Wrapper	Significant gaps. Escalate to security/legal review.

Automated Risk Flags

The scoring engine automatically generates contextual flags when it detects patterns that warrant procurement attention:

Identified as thin API wrapper — Native IP score penalized, description signals upstream dependency
No documented integrations — Siloed tool, higher switching cost
No compliance certifications — May not meet enterprise security requirements
Very low pricing — Margin structure suggests pure API pass-through, not sustainable R&D
Single-provider dependency — Name or description implies reliance on specific upstream infrastructure
Disproportionate data exposure — Vendor data access exceeds the value proposition

Procurement Tools

Tuesday Test ROI Calculator

Quantify the cost of tool failure for your specific organization. Input your team parameters and get a personalized risk/savings projection. Try it →

RFP Builder

Generate procurement-ready RFP documents with resilience criteria baked in. Select tools, define requirements, and export a structured evaluation framework. Build an RFP →

Our Principles

Evaluation Independence

Scores, rankings, and eliminations are computed independently of commercial relationships. Affiliate partnerships exist in a separate layer and never touch the evaluation engine.

Transparent Methodology

Every score is derived from documented factors. No opaque algorithms or hidden weights.

Vendor-Neutral

We evaluate open source and proprietary tools using the same framework and criteria.

Regular Updates

We re-evaluate tools when vendors change practices, pricing, or ownership. Assessments reflect the latest information we have.

Self-Assessment

We apply the same methodology to ourselves. Our practices and limitations are disclosed.

Actionable Output

Every report includes specific due diligence questions and procurement checkpoints.

Commercial Independence

Praxis maintains affiliate partnerships with some tools in our database. These partnerships are structurally separated from the evaluation engine.

The wall: Affiliate relationships exist in a separate layer from scoring. The evaluation engine does not receive, process, or consider any information about which tools have commercial agreements with Praxis. Scores are computed from technical criteria only. This separation is architectural, not policy-based — the scoring code literally does not have access to partnership data.

Verification: Every score Praxis produces is reproducible from the published methodology. If you question whether a partner tool received favorable treatment, you can audit the score against the criteria documented above.

Full disclosure: See our Partners & Transparency page for a complete list of commercial relationships and our commitments around evaluation integrity.