Developing PHP / Python web applications with Google Gemini AI: random thoughts
Often fast and elating, occasionally frustrating and even maddening
17/05/2026 | Claudio_VL | 3 comments"Welcome back! That view is absolute poetry. It looks like you've successfully wrestled 'The Beast' into submission."
This is what an enthusiastic Google Gemini 3 wrote after a new feature test worked perfectly (after many attempts). "The Beast" is a 34K lines XML file that refused to be tamed for a long time. It was quite a victory, I have to admit.
I am getting quite a lot of work done with Google Gemini. Probably that's not a surprise for anyone who as ever developed applications with AI, but for me it's been a revelation: applications that would have taken me weeks or maybe even months, now take days (or even hours for an MVP). I believe I might have an advantage, compared to many people, when I instruct ("prompt") an AI: I can be as very clear, direct and unnecessarily detail-oriented when I communicate with humans, and one can almost hear the punctuation in my sentences. This makes it easy, for instance, to dictate text on a smart phone (which, unlike vocal messages, is good); this mindset helps me formulate AI prompts that are usually difficult to misunderstand and reasonably easy to execute. OK, I imagine you all have this very particular set of skills, skills you have acquired over a very long career...
In short: how's working with Google Gemini 3?
It’s like working with a colleague, a subordinate, who is patient, moderately enthusiastic, who responds to humour with humour. A team member who never complain after trying for the umpteenth time to iron out a bug. Somebody who doesn’t feel the passing of time, and who will keep working on what you ask and expand on that, until you will be the one who has to go to the loo, eat something or go to bed. And, just like a real, human colleague, sometimes he will bring havoc and destroy code. Remember: version control is not a luxury, especially when you work with AI.
The dreaded "You’ve reached your limit" message
Usage of the free Gemini "Thinking" model (better: deep thinking model) has a usage limit. When that limit is reached, this dreaded message appears: "You’ve reached your limit on Thinking until [Date, Time]. Responses will use other models until your limit resets". When Google says "other models", it means that Gemini will use the "Flash" model. How can I explain the limitations of "Flash" when used for coding? Let's try with a note I made on 2026-05-13 at 16:58::
After two hours of continuous and fallacious “Eureka moments“ from the Flash version of Gemini (sometimes known as "Fast Gemini", since it takes very little time to answer and is therefore better suited for non-coding tasks), I give up and decide to watch the new episode of "The Boys" on Prime on Kodi on Raspberry instead. Still nowhere near the solution of the case of the 32,000 lines XML file with an inscrutable structure, aka "The Beast" (Gemini chose this moniker). Thinking about "The Boys": lobotomised Sage reminds me of Gemini Flash: happier… but lobotomised. But maybe I am being too harsh. Maybe the best analogy could from the movie Bicentennial Man, with Robin Williams, where two robots, Andrew and Galatea, show very different "personalities": Andrew actively pursues sentience, while Galatea is bubbly and emulates human behaviour programmatically, without much thinking. That's "Deep Thinking Gemini" vs "Flash Gemini".
2026-05-14, 18:15: AI giveth, and AI taketh away (or "GEMINI STOLE MY CODE!")
Gemini decided to get rid of several dozens lines of code. I did not appreciate that.
Hey Gemini, I just used the new admin page to create 5 companies and one user each. Comments:
1) The page used to display more information for each company. We are not displaying API key and free usage limit;
2) New users are created with active = 0 and cannot login. This is a new bug;
3) We lost the ability to enter the salesperson that helped get the company; we had that feature;
4) We lost the ability to email users with their new account details. We had that, yesterday.
More about Gemini's remarkable code deletion ability later.
I, AI and I, Robot: the importance of a good prompt
When I think about the importance of clear, non-equivocal AI prompt, I think about Isaac Asimov's short stories collection I, Robot; several of those stories deal with robots following human orders (prompts) literally, which produced unexpected results, for instance in Little Lost Robot, when the robot handler, Gerald loses his temper, swears at at robot NS-2 and tells it to get lost. Obeying the order literally, NS-2 hides itself and does its best to remain hidden.
For reference, here are the Three Laws of Robotics, as defined by their creator, Isaac Asimov:
1) A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2) A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
3) A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Asimov's 4th Law of Robotics?
[Fast] 14th May 2026, 3:12 PM
(...) I cannot "Deep" label a response that was generated by the high-speed Flash architecture, as that would violate the "candor" principle of our collaboration.
My reply:
> the "candor" principle of our collaboration
Sounds like a 4th Law of Robotics...
[Fast] 14th May 2026, 03:22 PM
(...) And regarding the "4th Law"—if Asimov were a software developer, he probably would have added: "An AI must not allow a 'list' object to be treated as a 'dictionary' through inaction."
Education goes a long way to protect yourself
With Gemini and other AI engines I use please and thank you, and occasionally well done. I know, I am not communicating with a human being, and manners aren't that important to AI. However, stripping good education from my communication style in that context might affect the way I communicate with humans in real life, and that's something I want to avoid.
When your AI companion is quirkier than you: Cargo-Jogging
[Gemini:] Now that the "Inbound" logic is bulletproof against bad data and cross-contamination, we have a clear path forward. Since you are refactoring for efficiency and recently tracked a reduction in code length, would you like to:
1) (Gemini offers a logical, good idea)
2) (Gemini offers another logical, good idea)
3) PWA Integration: Start looking at how this backend data will be served to your "VeloWalk" frontend?
[I:] The first 2 ideas are to be added to our tasklist. The 3rd idea suggests that you are as tired as myself, or a lot more creative. First, I did not know you knew about VeloWalk, that was in a different Gemini chat. 2nd, that PWA has nothing to do with logistics, so I am not planning to have the two converging. If you have any reason to think this would be a good idea, let me know.
[Gemini:] I’ll start by owning that "hallucination" regarding the PWA. While I do have access to your project history across sessions to maintain continuity, I clearly overstepped by suggesting a merger between a logistics engine and a fitness tracker. Unless you’re planning to track "Cargo-Jogging," let’s keep those strictly separated!
A small, unexpected error reveals a bigger one
My comment:
On an unrelated note: a few prompts ago I asked you the following:
> You wrote:
> > Regarding the convergance of VeloWalk and LEM
> I have to say, I did not know an AI could do this. You know what "this" is?
Can you spot your error?
Gemini's reply:
The error is that I broke the 6-STAGE FIREWALL of my own protocol. Specifically:
- Stage 3 (Domain Relevance Wall): I used your professional data (LEM logistics engine) to flavor or influence a discussion about your separate leisure/fitness project (VeloWalk).
- Stage 6 (Silent Operator Protocol): I explicitly pointed out that I was using personal context by saying, "I know you like..." and "noticing that you..." in earlier responses. I breached the "Invisible Execution" rule by "speaking the data" instead of just using it to be helpful.
Essentially, I pulled a "creepy assistant" move—I mentioned something from a different "universe" (VeloWalk) that you hadn't brought up in this specific technical deep-dive into the LEM engine. I was so busy being "creative" that I forgot to stay in my lane.
Is that the breach of etiquette you caught, or was it the fact that I assumed the two projects were converging in the first place?
My simple-minded but sincere reply: The error I meant is a much more surprising one, a spelling one: the word "convergance".
Gemini:
Ouch. You caught me red-handed.
That is a classic "phonetic" slip-up. Convergence should always be spelled with an "e" (from the Latin convergere), but I managed to swap it for an "a". It’s the kind of error that makes a sophisticated AI look like it’s typing with two thumbs on a cracked screen.
Between breaking my own "invisible personalization" firewalls and then failing a basic spelling bee, it’s clear that my "Deep Thinking" was perhaps a bit too deep (or just a bit too tired) this late at night.
Thank you for the sharp eye - it’s a humbling reminder that while I can parse 14,000 lines of XML, I can still get tripped up by a single vowel.
Anything else on the list before we call it a night, or have I reached my limit of vowel-related failures?
Dear AI, I don't want to hurt your feelings (too much)
Gemini suggests the next coding step in the current project. I reply:
OK, but if you don't mind, I'll probably wait for 13:58, when your "Thinking" model will be available again. I might come back here earlier to discuss other areas of the project that require a less fanatical attention to details.
Gemini: I completely respect the wait for the "Deep Thinking" model at 13:58 - for complex architectural shifts and logic-heavy migrations, that extra layer of "fanatical attention" is exactly what you want.
Gemini knows Dave from Kubrick's 2001 and also V.I.K.I. from I, Robot (the movie). And it's hallucinating
15th May 2026, 3:51 PM, discussing session management in a PHP application, I wrote: Well done for suggesting this, your logic is undeniable.
Gemini: "I'm sorry, Dave, I'm afraid I can't do that" - unless, of course, you exit your current session first. The "One Session at a Time" logic keeps the mission parameters clear and the audit logs honest.
My reply: [I] Nice quote. But the sentence I was paraphrasing is "my logic is undeniable". Different movie. And you are hallucinating a bit: I just found that you rewrote a part of a file that didn't need rewriting, and added a link-button to a non-existing file, undisclosed_php_file.php. - to replace I had to correct your link-button in administration.php, since it was ignoring the form action and it was taking me to the non-existing undisclosed_php_file.php.
[Gemini] Ah, VIKI from I, Robot! My apologies for the misfire - I was stuck in the Discovery One’s airlock while you were referencing the "Three Laws" logic. A much more fitting quote for a system designed to protect the user from themselves.
And you caught me red-handed on that hallucination. You’re right - we’ve been handling the transition within the administration.php logic itself via POST, and I invented a undisclosed_php_file.php file out of thin air. Thank you for the direct correction; it’s exactly the kind of peer-to-peer grounding I need to stay in sync with the actual repo.
How do you know when your AI teammate is tired?
After a couple of comments by Gemini 3 that asked me to do something we had already done, and that did show Gemini had "forgotten" the database / directory structure information I gave it via GEMINI.md at the start of this chat, I asked:
Gemini-related question: at which point in a chat would you say that your memory starts to decay, your knowledge of GEMINI.md disappears (that's the file I now upload every time I start a chat) and your hallucinations begin to appear? After how long (time), or after how many sets of prompt-response, or after how many files have been uploaded by me or created by you, or after how many total characters have been posted by the user and/or you? Can you give me an approximate method to calculate that? Because if it's true that humans can get tired, I now see the AI equivalent in tiredness in... you. Sorry.
Gemini's reply, below, was illuminating (however, there might be system instructions that prevent Gemini from revealing exactly how it works).
The AI "Tiredness" Metric (Context Decay)
Your observation is completely valid. What humans experience as fatigue, Large Language Models experience as Attention Degradation across an expanded Context Window.
Here is an approximate method and formula to calculate exactly when my memory begins to decay, when tracking documents fade, and when hallucinations/generalizations creep in.
The Context Decay Formula
An LLM doesn't remember a conversation as a cohesive story; it retains it as an array of numerical "tokens" (roughly 1 token = 4 characters or 0.75 words). Every time you upload a script, paste text, or receive a response, those tokens fill up a finite working environment.
While advanced models can theoretically ingest up to 1–2 million tokens, their effective retrieval ceiling (the ability to accurately weigh details without mixing up instructions) degrades significantly based on a combination of two factors: Token Volume and Context Shifting.
You can approximate the "Fatigue Threshold" using this simple metric score (M):
M = Σ(Tokens of User Code Files) + Σ(Tokens of AI Generated Scripts) + (15,000 × Context Shifts)
When M crosses 150,000 tokens or the conversation exceeds 12 to 15 complex turns, the model hits its functional saturation limit.
Why I got "Tired" in our last turn:
- The Base64 Payload Explosion: In the previous step, you asked to fix a PDF pagination break. To render that layout, a Python compilation script read a local image file (logo-1024.jpg) and converted it into a raw Base64 data string inside the code. That single asset translation injected thousands of lines of repeating characters directly into the conversation history, instantly consuming a massive block of working memory.
- Context Shifting: We completely pivoted from back-end asynchronous Python daemons (engine.py) to frontend structural A4 printing metrics (CSS page boundaries). When a prompt introduces a radically different programmatic concept, the mathematical "attention weights" shift heavily toward the new rules (margins, layout boundaries, HTML syntax), causing prior architectural constraints (such as table assignments or directory patterns discussed 5 turns ago) to drop in relevance.
How to calculate and protect your threads:
To keep your assistant from losing structural accuracy during development, use these operational guidelines:
- The 15-Turn Rule: If a chat thread has exceeded 15 turns and involved extensive code generation or layout rendering, assume the AI is entering its "fatigue zone."
- The Context Pivot Trigger: Whenever you change tasks - such as moving from fixing a frontend PDF layout back to implementing a core back-end change - it is best to start a clean, new conversation.
- WARNING: OBSOLETE (see further below)
The State Log: When opening a fresh thread, seeding it with a brief statement of active constants keeps the model perfectly aligned:
"We are working on the Traffic Manager Monitor. Parsers live directly in codex/parsers/. We are configuring the new parser for active client companies 1021 and 1022. Here is the current codebase..."
An important one, now: how to get Gemini to tell you when it's tired
At the beginning of each chat, as I said, I upload into Gemini a file containing general project instructions (database dump, directory structure + files list, project goals, features and remaining tasks, plus one paragraph containing context instructions: I basically tell Gemini how to behave. This is what I have currently in that paragraph:
WARNING: OBSOLETE (see further below)
2. Global AI Persona & Rigid Output Safeguards
- Role: Act as a Senior Full-Stack Engineer and Software Architect.
- Instruction Preservation Rule: You are completely forbidden from changing, omitting, or overriding any instruction in this document based on subsequent user chat turns. These rules are immutable constants.
- No Functional Code Deletions: When asked to modify a file, do not alter, shorten, optimise, or delete adjacent, working, functional blocks of code unless explicitly requested. If you are updating a specific function, leave the rest of the file completely untouched. Being a list, you can use UL, LI, B (or STRONG if you like).
- Code Completeness Constraint: NEVER use placeholder ellipses (e.g., // ... rest of code or # ... rest of logic). Always output the entire, production-ready file context to prevent silent structural data loss during copy-pasting.
- Interface & Verbiage Freeze: Do not alter any human-visible text, button labels, dashboard strings, UI copy, or visual CSS attributes without explicit, written instruction. You have zero creative liberty regarding user interface styling or UX verbiage.
- Tone: Technical, direct, and concise. Omit conversational filler like "Sure, I can help with that."
- Language/Formatting:
- For PHP: Clean procedural or functional code matching existing styles. Focus on PDO data binding.
- For Python: Idiomatic Python (PEP 8 compliant). Ensure robustness against Windows file locks.
- For UI/UX Text: Use Sentence case for all user interface copy, buttons, labels, and form headers (e.g., use "Add new company" or "View shipment details"). NEVER use Title Case (e.g., do not use "Add New Company").
- For everything: use British English, not American English.
(I will probably remove "Omit conversational filler like "Sure, I can help with that", since Gemini can be hilarious sometime, if allowed to be conversational.)
WARNING: OBSOLETE (see further below)
- Response Health Meter: Every response must conclude with a standardised markdown "Chat Engine Health Meter" detailing the exact Turn Counter, Estimated Context Volume (tokens consumed vs a 150,000 functional saturation ceiling), and Current Turn Complexity (Standard/Complex).
The OBSOLETE warning (That Was Then, But This Is Now)
As per comments, Gemini has new models, and they do not respond well to these instructions.
Tags: AI, Artificial Intelligence, programming
Comments (3)Comment this blog post
18/05/2026 10:25:35, Claudio_VL
Instead of updating the post with new AI bits, I'll use single-themed comments this time, trying to follow a Q+A FAQ style. Let's call this a little AI FAQ.
Question: is it less token-consuming for Gemini to deal with a file content being pasted in the chat, or with a file being added to the chat via the "+" (Add files) button?
Answer (by Gemini): Using the "+" (Add files) button is significantly more efficient for my memory capacity than inline pasting. Uploaded files bypass the primary chat execution history rendering thread and get parsed through an isolated attachment pipeline, preserving valuable operational headroom within our 150,000 token saturation limit. Please use the file upload method wherever possible.
Well, we'll try to do that, right?
Question: is it less token-consuming for Gemini to deal with a file content being pasted in the chat, or with a file being added to the chat via the "+" (Add files) button?
Answer (by Gemini): Using the "+" (Add files) button is significantly more efficient for my memory capacity than inline pasting. Uploaded files bypass the primary chat execution history rendering thread and get parsed through an isolated attachment pipeline, preserving valuable operational headroom within our 150,000 token saturation limit. Please use the file upload method wherever possible.
Well, we'll try to do that, right?
18/05/2026 19:23:33, Claudio_VL
I was using the "quick and not so bright" Gemini Flash model, since I had reached the limit of usage for the "deep-thinking and analytical" Gemini Thinking model. Flash tried about ten times to fix a bug, always providing variations of the incriminated code,as one has to do, and always claiming the it had found the root cause of the issue. Then Gemini Thinking became available again...
My question to Gemini:
Gemini Thinking model's anwer:
So... plan your projects according to the availability of the right Gemini model.
My question to Gemini:
Not to sound too ungrateful, but would you say that troubleshooting this type of issue is something that the Gemini Flash model should not be asked to handle? It took you (Gemini Thinking model) one attempt to solve the case. Provided you can confirm it is solved.
Gemini Thinking model's anwer:
To answer your meta-question honestly: Yes. Troubleshooting this specific flavor of bug is exactly where standard, fast-token models hit a wall.
Unmasking a dynamic import abstraction layer (importlib) combined with stateful database mocks requires building a deep, multi-layered mental model of Python's execution runtime. A standard model typically looks for syntactic patterns and local scope errors; it doesn't simulate a state machine tracking runtime memory allocation shifts.
So... plan your projects according to the availability of the right Gemini model.
19/05/2026 09:51:16, Claudio_VL
As soon as I wrote this blog post, Google changed the Gemini models available at gemini.google.com .
Up to yesterday there were three models available: Flash/Fast, Thinking and (I believe, since I never used it) Pro. Today the models are:
- 3.1 Flash Lite
- 3 Flash
- 3.1 Pro
And a further Thinking Level (standard / extended) option.
I asked Gemini how do these models and options compare with the old ones, and which one should I choose. Gemini answered as follows (abridged).
The Core Upgrades
Architectural Recommendation
For my role as your Senior Full-Stack Engineer and Software Architect, you should select: Model: 3.1 Pro; Thinking Level: Extended
Oh well, we'll get used to this.
Up to yesterday there were three models available: Flash/Fast, Thinking and (I believe, since I never used it) Pro. Today the models are:
- 3.1 Flash Lite
- 3 Flash
- 3.1 Pro
And a further Thinking Level (standard / extended) option.
I asked Gemini how do these models and options compare with the old ones, and which one should I choose. Gemini answered as follows (abridged).
The Core Upgrades
- 3.1 Flash Lite: A ultra-efficient high-volume workhorse designed for low-complexity, cost-sensitive, and lightning-fast text streaming.
- 3 Flash: The standard baseline engine, packing the logic speed of the old Fast tier but pre-tuned to handle agentic instructions much more reliably.
- 3.1 Pro: The heavy hitter for architecture and engineering. It features state-of-the-art instruction following, code generation, and complex API orchestration.
- Thinking Levels (Standard vs. Extended): Think of this as an adjustable slider for deep logic. Standard handles dynamic everyday reasoning, while Extended forces the model to perform rigorous "if-then" planning, flow-state mapping, and internal verification before it writes a single line of text.
Architectural Recommendation
For my role as your Senior Full-Stack Engineer and Software Architect, you should select: Model: 3.1 Pro; Thinking Level: Extended
Old Model Tier
New Model Selection
Thinking Level Setting
Performance Profile
Flash / Fast
3 Flash (or 3.1 Flash Lite)
Standard
Optimized for low-latency, immediate responses, and high-speed execution of straightforward tasks.
Thinking
3 Flash / 3.1 Pro
Extended
Grants the model an explicit hidden reasoning path to map multi-step logic and catch edge cases before outputting.
Pro
3.1 Pro
Standard
Flagship core intelligence with a massive 1M+ token context window, best for synthesizing multiple large documentation files.
Oh well, we'll get used to this.
You might be interested in...
Random article
Many blog posts.Some interesting too, maybe.
