Jeremy Siwik

AI Architectures

Choreographing a bespoke network of models to support Labs' interface

abstract

Impact
Translating a custom article format from handcrafted to AI enabled, I organized a backend flow of models to support a novel interface for interacting with generative AI.

Outcome
From the first successful generations to the current framework, took a rather unreliable flow lasting 5-6 minutes, down to about 5 seconds with few hiccups.

Challenges went from successfully generating outputs, to speeding up the flow, to improving the quality, to becoming increasingly interrogable.

Role: Developer

Challenge

Our software's input requirements were strict and unique, so the first obstacle was getting models to produce working data digestible by our stack.

From there, improve performance and layer on features to support UX improvements.

Goal

Enable a newly responsive AI surface with multiple entry points, where users can prompt a variety of locations within a cohesive interface.

Principles Guiding the Framework:

Delegation

Applying varyingly complex operations to differently weighted models for speed, quality, and cost.

Speed vs Quality

Fine-tuning this balance, providing options for depth, and designing for users to extrapolate where needed.

Resilience

Fallbacks, edge cases, and UI feedback when generations failed.

Easier Prompting

Easing requirements of users through preset system prompts and model choices behind different features.

Memory and Context

Independent model calls would better respond if they had context of the initial content, and enable quicker UX flows by skipping manual prompts.

Our orchestration was designed for educational explainers, needing to combine sourced and generated information while drawing on a consistent body of context for follow up questions.

Started here:

1: No-Code

Did it! But 14 (😅) steps in a slow and unintegrated external no-code tool.

Modeled the steps after our own writing and design process.

~4-6 minutes

start

enrich

research

deeper dive

generate outline

extract image terms

search images

generate text

generate layout

generate metadata object

array length evaluator

generate page

compile

result

start

2: Connecting

External tool accessible by the frontend via an api and returning prompts.

Next steps: building an interface, and finding more ways to interact within it.

4-6 minutes

prompt

story generation

story

first responsive element

prompt

3: Preliminary UI

Intermediate step. Working UI and long wait times - prohibitively so.

Objective number one was gen time.

~4-6 minutes

prompt

story generation

story

adding responsive elements

prompt

Interlude -- Started building everything internally here.

Closer control provided more ways for models to coordinate and closer parity with the interface.

4: No more no-code

Once over the in-house hurdle, we experimented quickly, chasing speed.

Mainly toying with steps and prompts, still using simple, linear structures. ~70% success rate.

~2-3 minutes

prompt

outline

media

script

fact check

generate pages

story

responsive elements

prompt

5: Beat a minute

Compartmentalized tasks to simplify the initial step, and then ran the big ones in parallel.

Strategizing around compromises for speed, quality, or reliability. Beginning to link calls.

~1 minute

prompt

outline

script

parallel generation

story

responsive elements

prompt

6: Made reliable

Function calling pushed a 95% success rate.

And a research step which progressively triggers slides to be created and added to the story.

~5 seconds

prompt

streaming research

page generation

story

responsive elements

prompt

7: Memory + Thinking

Shared memory contextualizes information across interactions.

Brushed up against a quality vs time barrier, so added a richer, optional thinking mode.

~5 or 30 seconds

prompt

streaming research

page generation

story

responsive elements

prompt

We're still actively developing the underlying ai framework which supports the product, but have reached a point where the fundamental issues, namely speed and reliability, are performing well enough to focus on fine-tuning the architecture.

Continually improving reliability, improving the quality of text through better system prompts, sourcing more forms of assets, and adding capabilities are some of the acute focuses.

Bigger sprints include devising ways for users to customize their own system prompts to tailor generation to their learning preferences, as well as continuing to play with ways that the UI/UX of the product interfaces with the underlying models.

outcomes

Generation Time: ~5 minutes → ~5 seconds

Generation Reliability: ~50% → ~95%

Memory and Context: Successfully implemented a strategy for memory and context supporting features and UX improvements.

Interrogability meaning the capacity for generations to be interrogated - how easy it is for users to converse with pieces of what the models produced, ranging from pages, to images and paragraphs, to parts of images, phrases, and words for either clarity, context, or verifiability: 1 → plethora.