How Praxis evaluates AI tools for enterprise procurement
Thousands of AI tools are indistinguishable apps built as thin pass-throughs to upstream LLM APIs with zero proprietary logic. For SMBs, the cost of choosing a wrapper over a sovereign platform is measured in wasted procurement cycles, vendor dependency, and stranded training investment.
Praxis solves this with the Anti-Wrapper Verification Shield — an algorithmic scoring engine that evaluates every tool across six weighted dimensions to produce a Resilience Score (0-100) and assign each tool to a resilience tier.
Every tool is classified into one of five tiers based on its composite Resilience Score. Click any tier to see the qualifying criteria:
Sovereign tools own their core technology stack. They have independent model weights, proprietary algorithms, or custom training pipelines that cannot be replicated by switching upstream providers. They offer real data residency, local execution options, and deep integrations.
What qualifies a tool as Sovereign:
Examples: Datadog, Zapier, Salesforce, HubSpot, Monday.com
Durable tools have defensible workflows and proprietary data moats that create genuine switching costs. They may rely on some upstream infrastructure but add substantial value through custom logic, integrations, or domain expertise.
What qualifies a tool as Durable:
Examples: Canva AI, Jasper, Notion AI, Loom
Moderate tools are functional but have gaps in compliance documentation, limited integrations, or unclear differentiation from competitors. Usable for non-critical workflows, but require active monitoring.
Typical characteristics:
Fragile tools carry significant upstream dependency. They may function today but face existential risk from API pricing changes, provider policy shifts, or competitive obsolescence. Evaluate alternatives before committing.
Wrappers are thin pass-throughs to upstream LLM APIs (OpenAI, Anthropic, etc.) with zero proprietary logic, no data moat, and no defensible competitive position. Any investment in a wrapper tool is at maximum vendor risk.
Every tool's Resilience Score is computed from six weighted dimensions. The algorithm runs from our complete tool database:
Does the tool have fine-tuned models, proprietary algorithms, or custom training pipelines? Or is it a thin UI over someone else's API?
Native integrations, bidirectional data flow, and ecosystem breadth. Siloed tools with Zapier-only connections score lower.
SOC2, GDPR, HIPAA, ISO 27001, FedRAMP — weighted by rigor. No compliance documentation documented = automatic flag.
Unit economics that imply real R&D investment vs. pure API margin arbitrage. Extremely low pricing triggers wrapper signals.
Export formats, off-ramp quality, and migration feasibility. Tools that lock your data in proprietary formats score lower.
Dependency on a single upstream provider, API ToS fragility, and multi-language/platform support.
| Grade | Score Range | Tier | Procurement Guidance |
|---|---|---|---|
| A+ / A / A- | 80–100 | Sovereign | Strong vendor profile. Standard review sufficient. |
| B+ / B / B- | 65–79 | Durable | Good profile. Minor areas for vendor clarification. |
| C+ / C / C- | 50–64 | Moderate | Adequate. Request additional documentation on flagged areas. |
| D | 35–49 | Fragile | Below standard. Enhanced due diligence required. |
| F | 0–34 | Wrapper | Significant gaps. Escalate to security/legal review. |
The scoring engine automatically generates contextual flags when it detects patterns that warrant procurement attention:
Quantify the cost of tool failure for your specific organization. Input your team parameters and get a personalized risk/savings projection. Try it →
Generate procurement-ready RFP documents with resilience criteria baked in. Select tools, define requirements, and export a structured evaluation framework. Build an RFP →
Scores, rankings, and eliminations are computed independently of commercial relationships. Affiliate partnerships exist in a separate layer and never touch the evaluation engine.
Every score is derived from documented factors. No opaque algorithms or hidden weights.
We evaluate open source and proprietary tools using the same framework and criteria.
We re-evaluate tools when vendors change practices, pricing, or ownership. Assessments reflect the latest information we have.
We apply the same methodology to ourselves. Our practices and limitations are disclosed.
Every report includes specific due diligence questions and procurement checkpoints.
Praxis maintains affiliate partnerships with some tools in our database. These partnerships are structurally separated from the evaluation engine.
The wall: Affiliate relationships exist in a separate layer from scoring. The evaluation engine does not receive, process, or consider any information about which tools have commercial agreements with Praxis. Scores are computed from technical criteria only. This separation is architectural, not policy-based — the scoring code literally does not have access to partnership data.
Verification: Every score Praxis produces is reproducible from the published methodology. If you question whether a partner tool received favorable treatment, you can audit the score against the criteria documented above.
Full disclosure: See our Partners & Transparency page for a complete list of commercial relationships and our commitments around evaluation integrity.