How to run an AI pilot with multiple vendors

By Anshul Bhutani

7 Oct 2025

How to run an AI pilot with multiple vendors

After much consideration, you’ve decided to roll out an AI solution for your firm and have shortlisted 3 vendors. Each demo was impressive, but how do you determine which one is right for you?

For firms that haven’t zeroed down on a single vendor and have the resources to run side-by-side evaluations, we recommend conducting comparative pilots with real-life use cases. Here’s how we suggest structuring these pilots to maximize your chances of success.

Tip: Involve your vendor as early as possible so they can properly prepare. IT restrictions or integration requirements can take additional time, and early coordination ensures a smoother pilot experience.

4 Parts of a Successful Pilot

The objective of your pilot is to gather unbiased feedback on which platform best meets your firm’s needs. A pilot can be divided into four distinct phases:

  1. Task Selection

  2. User Selection

  3. Testing

  4. Implementation Experience Review

Each phase should be conducted carefully to avoid introducing bias and ensure that your evaluation reflects real-world usage.

Task Selection

The first stage of the pilot is to identify the set of tasks you want to benchmark. We recommend that our clients organize these tasks into four broad categories: 

  1. Drafting: drafting simple motions, petitions, research notes, agreements.

  2. Analysis: issue identification, redlining, fact discovery, due diligence, etc.

  3. Summarization: chronologies, deposition summaries, contract summaries, etc. 

  4. Research: closed research (across existing datasets) and open research (across the internet).

Try to identify at least 4 distinct tasks that your firm performs across each of these categories. For each task, clearly define your ‘must-haves’ and ‘nice-to-haves’. For example, some firms prioritize AI outputs that match in-house styles and templates, while others focus primarily on the quality of the legal analysis in the final deliverable.

For each task, prepare an evaluation rubric that, at minimum, tracks the following: 

  1. Effort required to generate the output

  2. Quality of the output

  3. Additional hours spent converting the output into a final deliverable

User Selection

This is often the trickiest part of running the pilot. Attorneys and support staff already face significant pressure from client deliverables, so securing enough time for AI evaluations can be challenging. 

As a rule of thumb, we recommend involving 30-40% of your firm in the pilot, distributed evenly across PQE-levels and practice areas. Key considerations include:

  • Select a representative sample of users, not just the most tech-savvy. The goal is to select a product that works for the entire firm, not only for one or two individuals. 

  • Include support staff, paralegals and research assistants, alongside attorneys.

  • Add extra users. You don’t want the pilot to stall because a few teams got caught in a workload spike! 

  • Set clear expectations. The evaluation rubrics you create are critical for providing your users with clear standards for assessment.

Testing

Now we come to the fun part. You’ve already lined up the tasks you want to assess and the user set. The next step is to test the products!

We strongly recommend you divide the evaluation into two parts: quality and user experience. 

Evaluating Quality 

The most effective way to assess the quality of AI outputs is to generate responses using all tools and have your users grade them blindly. While your vendor will surely assist with this, ensure that you produce the test outputs yourself.

When performing quality evaluations, keep the following in mind:

  1. Provide all responses at once. Evaluations are comparative, so users need to see every output upfront.

  2. Keep timelines short to maintain focus and consistency.

  3. Distribute tasks to avoid overburdening the same users with every evaluation.

  4. Maintain consistency per task. For each task, the same user (or set of users) should evaluate all vendor outputs. Avoid situations where, for a particular drafting exercise, User A evaluates Vendor X and Y, while User B evaluates Vendor Z – this can introduce irregularity.
    27

Evaluating User Experience 

Beyond output quality, you may want to ensure that the platform is intuitive and well-designed. Clunky interfaces or slow processing times can significantly hamper adoption. We recommend giving users two weeks of access to explore each platform freely.

During this stage, also evaluate the vendor’s support and training. Assess training sessions, knowledge bases, support response times, and integrations with household tools like Word. Each of these factors should be considered separately to get a full picture of the user experience. 

Implementation Experience

While testing is underway, you should also assess the vendor’s level of engagement. Remember, you’re selecting a long-term partner. Some key considerations  are:

  • How well do they understand your requirements?

  • How do they track usage and improve adoption?

  • How quickly do they respond to feature requests? 

  • How clearly defined is their product roadmap? 

  • How effectively do they set up and handle support requests?

Your pilot is the best way to cut through the sales rhetoric and see how the promises really hold up in practice. Think long-term: choose a vendor that will closely support you for the long run, not just during a 2-week pilot.

Concluding the Pilot

By the end of the pilot, you’ll have all the data points you need to choose the right vendor. From there, it’s a matter of negotiating, procuring licenses, setting up integrations, and rolling the solution out to your lawyers. Easy!

Choosing an AI partner shouldn’t be a leap of faith. A well-run, comparative pilot turns a “great demo” into a “proven fit” with real tasks, real users, and real timelines.

Starting your journey? Contact our team to learn how to tailor a pilot to your firm and accelerate your AI adoption today.