AI Architectures
Choreographing a bespoke network of models to support Labs' interface
abstract
Impact
Translating a custom article format from handcrafted to AI enabled, I organized a backend flow of models to support a novel interface for interacting with generative AI.
Outcome
From the first successful generations to the current framework, took a rather unreliable flow lasting 5-6 minutes, down to about 5 seconds with few hiccups.
Challenges went from successfully generating outputs, to speeding up the flow, to improving the quality, to becoming increasingly interrogable.
Role: Developer
Challenge
Our software's input requirements were strict and unique, so the first obstacle was getting models to produce working data digestible by our stack.
From there, improve performance and layer on features to support UX improvements.
Goal
Enable a newly responsive AI surface with multiple entry points, where users can prompt a variety of locations within a cohesive interface.
Principles Guiding the Framework:
Delegation
Speed vs Quality
Resilience
Easier Prompting
Memory and Context
Our orchestration was designed for educational explainers, needing to combine sourced and generated information while drawing on a consistent body of context for follow up questions.
Started here:
1: No-Code
Did it! But 14 (😅) steps in a slow and unintegrated external no-code tool.
Modeled the steps after our own writing and design process.
~4-6 minutes
2: Connecting
External tool accessible by the frontend via an api and returning prompts.
Next steps: building an interface, and finding more ways to interact within it.
4-6 minutes
3: Preliminary UI
Intermediate step. Working UI and long wait times - prohibitively so.
Objective number one was gen time.
~4-6 minutes
Interlude -- Started building everything internally here.
Closer control provided more ways for models to coordinate and closer parity with the interface.
4: No more no-code
Once over the in-house hurdle, we experimented quickly, chasing speed.
Mainly toying with steps and prompts, still using simple, linear structures. ~70% success rate.
~2-3 minutes
5: Beat a minute
Compartmentalized tasks to simplify the initial step, and then ran the big ones in parallel.
Strategizing around compromises for speed, quality, or reliability. Beginning to link calls.
~1 minute
6: Made reliable
Function calling pushed a 95% success rate.
And a research step which progressively triggers slides to be created and added to the story.
~5 seconds
7: Memory + Thinking
Shared memory contextualizes information across interactions.
Brushed up against a quality vs time barrier, so added a richer, optional thinking mode.
~5 or 30 seconds
We're still actively developing the underlying ai framework which supports the product, but have reached a point where the fundamental issues, namely speed and reliability, are performing well enough to focus on fine-tuning the architecture.
Continually improving reliability, improving the quality of text through better system prompts, sourcing more forms of assets, and adding capabilities are some of the acute focuses.
Bigger sprints include devising ways for users to customize their own system prompts to tailor generation to their learning preferences, as well as continuing to play with ways that the UI/UX of the product interfaces with the underlying models.
Generation Time: ~5 minutes → ~5 seconds
Generation Reliability: ~50% → ~95%
Memory and Context: Successfully implemented a strategy for memory and context supporting features and UX improvements.
Interrogability meaning the capacity for generations to be interrogated - how easy it is for users to converse with pieces of what the models produced, ranging from pages, to images and paragraphs, to parts of images, phrases, and words for either clarity, context, or verifiability: 1 → plethora.