The Architect’s Dilemma: Rethinking Assessment When AI Does the Heavy Lifting

I’ve been sitting with a specific realization this week, and it’s uncomfortable.

In under sixty minutes, I designed, scaffolded, coded, and deployed a production-grade machine learning application. I didn’t just write a script; I built a system.

With the help of Antigravity (Google), an advanced agentic coding assistant, I went from a rough idea (“predict crypto prices with sentiment”) to a fully functional dashboard with:

  1. Multimodal Pipelines (fusing API data, NLP, and price history).
  2. Deep Learning Models (a stacked LSTM in PyTorch).
  3. Premium UI/UX (responsive, dark-mode Streamlit dashboard).
  4. Academic Deliverables (A complete LaTeX report compiled to PDF, plus both Initial and Final Presentation Slides).
  5. Deployment (fully initialized GitHub repo).

I did not write a single line of code.

And yet, I “built” it.

The final “CryptoPulse” application was generated via prompting in <1 hour.

🛑 The “Oh S**t” Moment

As educators, we talk about AI as a tool for “efficiency.” But when the efficiency gain is 100x, it’s not just faster—it’s conceptually different.

When the agent autonomously initialized the Git repository and pushed the code without me leaving the chat window, I realized something profound:
The “Product”—the code, the report, the slide deck—has become trivially cheap.

For decades, our assessment models have been product-centric:

  • “Submit your code.”
  • “Hand in your report.”
  • “Show me your slides.”

If a student can generate a Distinction-level product in an hour by knowing what to ask, then grounding our assessment in the product alone is no longer just fragile—it’s obsolete.


🧠 From Mason to Architect

If the AI is the mason laying the bricks (writing the syntax, rendering the CSS, compiling the LaTeX), the student must become the Architect.

In this new paradigm, competence isn’t defined by manual execution. It is defined by:

  1. Vision & Specification: Can you articulate a complex problem clearly enough for an intelligent agent to solve it?
  2. Critical Evaluation: When the AI gives you a generic solution (as it did initially for me), do you have the domain knowledge to spot the flaws?
  3. Iterative Refinement: Can you guide the system from “mediocre” to “excellent”?
  4. Systemic Thinking: Do you understand how the components (Data -> Model -> UI) fit together, even if you didn’t hand-stitch every seam?

The Evidence: A Log of “Cognitive Leadership”

To prove this point, look at the evolution of my prompts. The value wasn’t in the initial request, but in the specific, knowledgeable corrections I applied.

1. The “Blue Sky” Vision

Prompt: “I want to create a multimodal sentiment and price analysis project… It needs to be a Streamlit app, use an LSTM model, and look ‘premium’ with a cyberpunk aesthetic. Also, generate a LaTeX report and presentation slides.”

Result: A generic, functional prototype.


📂 The Artifacts: “Done” in Minutes

To illustrate the scope, here are the actual deliverables generated by the agent. You can view the full source code here:
👉 GitHub Repository: nizamkadirteach/cdsproject

The 10-Page LaTeX Report:

The Initial & Final Presentation Slides:

Initial PresentationFinal Presentation

2. The Domain Critique (Where Assessment Should Live)

Prompt: “The UI text contrast is too low on the dark background. Fix the CSS. Also, the data isn’t saving to disk—I need a script to ensure the raw and processed data is present for grading requirements.”

Insight: I knew that “in-memory” data wasn’t robust enough for a reproducible scientific project. I had to intervene.

3. The “Rigor Check”

Prompt: “Can you check and validate you have provided and done what the project requirements are? Verify the ‘Full Marks’ criteria. also can we do a user manual and a pipeline diagram?”

Insight: I shifted from “builder” to “auditor.” I forced the AI to check its own work against a rubric.


🔥 The New Assessment Contract

So, what are we actually assessing in 2026?

It cannot be the output. The output is a given.
It must be the process of curation and critique.

I propose we stop grading the artifact and start grading the interaction:

  • Don’t ask: “Show me your code.”
  • Ask: “Show me the prompt log where you debugged the dataset alignment error.”
  • Ask: “Why did you reject the first three UI designs the AI proposed?”
  • Ask: “Where is the hallucination in this generated report?”

The students who will thrive are not the ones who can type Python syntax the fastest. They are the ones with the taste, judgement, and theoretical depth to wield these infinite-leverage tools responsibly.

We need to stop teaching students to be faster bricklayers. The robots lay bricks just fine.
It’s time to teach them to be Architects.


💬 Join the Conversation

I’m sharing this distinct “Antigravity” experience because it felt like a glimpse into a future that is already here.


How are you rethinking your rubrics for next semester?