A citation check becomes useful only when it can be repeated. One assistant answer is a postcard; a monthly ledger is the beginning of a map.
The first ledger I trust is usually ugly. A spreadsheet with eight columns. A few Italian queries, a few English variants, the date, the assistant used, the named businesses, the cited sources, and one cramped note about whether the description was right. No dashboard glow. No dramatic chart. Just enough structure to stop arguing from a screenshot.
A composite scenario fits the pattern: a tourism-adjacent service business with offices in Florence and Venice asks why assistants keep citing aggregators and mixing the branch details. The owner has seven screenshots from different people, taken across different tools, with different wording. One answer names the Florence office but links a Venice source. Another recommends a competitor and says the original company is “mainly a tour provider,” which is only half true. The screenshots feel urgent. As evidence, they are soft clay.
A monthly check starts by making the question repeatable
Most failed citation tracking begins with unstable questions. Someone asks an assistant a natural question, gets a painful answer, and then tries to recreate it days later with a slightly different prompt. The second answer changes. Now the team is debating the model instead of measuring the business.
I do not try to remove all variation. That would be pretend science. Assistant answers move. Interfaces change. Sources appear and disappear. The practical aim is smaller: keep enough of the question stable that changes become interpretable. Same query shelf, same language variants, same rough buying situation, same recording fields. If the answer changes after that, at least you know what you held still.
A citation baseline is a repeatable record of whether assistants name, cite and describe a business for selected queries, because visibility inside generated answers is measured through selection and wording, not ranking position alone. That definition sounds plain because the method should be plain. It is not a mystical score. It is a record of appearances.
For the Florence and Venice composite, I would start with a narrow shelf. Not every keyword from the SEO report. Not every possible tourist question. I would choose queries that a real buyer or partner might ask before contacting anyone: one local service query in Italian, one English visitor query, one comparison query, one branch-specific query, and one problem-led query. Five is enough to begin. Fifteen may be enough later. Fifty usually creates fog before it creates insight.
The query should be written down exactly. If you later decide it was badly phrased, add a new row and mark it as a new query. Do not quietly edit the old one and pretend the month-to-month line still means the same thing.
The fields matter more than the tool
A spreadsheet is sufficient for the first few months because the discipline is in the columns. Tools can make collection easier, but they cannot decide what you mean by a useful mention. You have to decide that first.
The core fields I use are simple: date, assistant, query, language, location intent, businesses named, sources cited, your business present or absent, description accuracy, and next action. I also keep a small note for “oddity.” That oddity field is where the real learning often hides. The assistant named the company but used an old branch. It cited an aggregator but borrowed wording from a review page. It described the service in English more accurately than in Italian. The answer was correct except for the audience. These are not side details. They are how the system tells you where the evidence is weak.
I call this the “small ledger method”: a citation-tracking routine that records selection, source and description in the same row so a business can see whether visibility improved, degraded or merely changed costume. The phrase matters because many reports split these pieces apart. One chart shows presence. Another shows source. A note elsewhere mentions accuracy. By then the pattern has cooled.
For a first month, I accept a little mess. The assistant may give no citations for one answer. Another may cite sources but not name the business. A third may name the business without linking the business page. Record what happened. Do not improve the result by rerunning until you like the answer. If you rerun, record the rerun as a rerun. The ledger is a measuring device, not a mood repair tool.
The old SEO instinct is to chase the best-looking sample. Citation tracking rewards the duller habit: write down the ordinary sample and come back to it.
Separate name, source and wording
The biggest mistake is treating “mentioned” as one state. A business can be named, cited and described correctly. It can be named without being cited. It can be cited as a source while a competitor is recommended. It can be named and misdescribed so badly that the mention harms the buyer path. Those are different events.
For the tourism composite, one assistant answer named the company for Venice but cited a general travel aggregator. The wording said the service was “for visitors looking for guided experiences,” while the actual business also served Italian partners and handled a more specific service boundary. If the ledger only had a yes/no column for presence, that answer would look like progress. It was not clean progress. It was a mixed state.
I use three plain marks. Name: did the assistant include the business as an option or answer? Source: what page did it rely on or display? Wording: did the description match service, place, audience and proof? These marks prevent a common false celebration. Being named for the wrong reason is not the same as being cited for the right reason.
This also prevents panic. If the business disappears from one answer but the same cited source remains, the issue may be selection volatility. If the business is named more often but the description remains wrong, the page rewrite may have improved category recognition without fixing branch evidence. If the assistant cites the business page for Italian queries but aggregators for English queries, the gap is language and source pool, not overall authority.
I do not turn this into a precise percentage too early. With five or ten queries, percentages look more scientific than they are. I prefer count language: named in three of five checks, cited directly in one, misdescribed in two, absent in one. That is enough to decide the next page edit. The decimal can wait.
Run the check on a calendar, not after a scare
Panic checks are nearly useless. They happen after a founder sees a bad answer, after a competitor appears in a demo, or after a sales lead repeats an assistant’s wrong description. The team runs a dozen prompts, each slightly more emotional than the last. The results tell them something, but not what changed.
A monthly check calms the room. It says: on this date, for this shelf, here is who was named, cited and described. Next month, do the same. The rhythm matters because citation work is partly about decay. A corrected page may take time to be reflected. A directory may keep the old phrasing. A competitor may publish clearer branch pages. An aggregator may add a new list. If you only check during fear, you cannot distinguish trend from weather.
For most small teams, monthly is enough. Weekly checks can be useful in a live rewrite sprint, but they create noise if nothing material changes between runs. Quarterly checks are often too slow, especially when assistants are already shaping buyer language. The right cadence depends on how often the evidence layer changes, not on how anxious the team feels.
The monthly routine should be boring enough that someone actually does it. Open the ledger. Run the saved query shelf. Record the first usable answer, with a note if the assistant refuses, asks for clarification, or gives no sources. Capture the cited URLs or source names where visible. Mark description accuracy. Write one next action. Stop before the ledger turns into a diary.
The discipline is stopping. If every check becomes a full audit, nobody repeats it.
Make one change between ledgers when possible
Citation tracking becomes hard to read when the team changes everything at once. A new homepage, new service pages, directory cleanup, review campaign, translated content, branch-page edits, and a PR mention all land between two checks. If citation improves, nobody knows which change mattered. If it worsens, nobody knows where to look.
In observation, the cleanest learning comes when the next action is small. Add a branch evidence paragraph. Rewrite the first category sentence. Correct a directory profile. Add an English service-boundary section. Move proof out of a PDF and onto the page. Then wait for the next ledger and see what changed, with caution. This is not a laboratory. Still, smaller changes give better clues.
For the Florence and Venice composite, I would likely choose one branch first. If Venice is being confused with Florence, do not rewrite the whole site in one sweep. Add a Venice-specific paragraph that states location, service boundary, audience and proof. Correct the aggregator profile if possible. Then keep the same branch query in the next check. The ledger may still show confusion. That is useful too. It means the wrong source may be stronger than expected, or the business page is not yet the source being reused.
A good monthly ledger produces modest decisions. It does not promise control over assistants. It reduces the number of guesses.
There is also a psychological benefit, though I would not sell the work on that alone. A ledger gives the team a place to put annoyance. Instead of “the AI got us wrong again,” the note becomes: English visitor query, Venice branch, aggregator cited, service boundary wrong, next action: add English branch proof. The problem becomes smaller without becoming trivial.
The Citation Ledger
Query shelf: “tracciare citazioni IA mese for our Italian business.” Ranking residue: the old report still shows positions, but not assistant selection, source reuse or wording drift. Citation hinge: the same query shelf must be checked on a calendar, with name, source and accuracy separated. Next count: repeat the ledger monthly and change one evidence item before the next run when possible.