After being roped in by my teenaged daughter, I've been playing a lot of Umamusume lately. It's a Japanese gacha game about "horse girls" (uma musume, or "horse daughter", in Japanese), schoolgirls with horse ears and tails in an alternate universe who inherit their names, personalities, and careers from (Japanese) racehorses in our world. The game obviously and doubtless intentionally appeals to people whose preferences run toward young woman with animal features (kemonomimi in Japanese); that's not my cup of tea, but I do know perfectly lovely people who enjoy that particular beverage, and I won't judge. However, in what follows I am going to render some judgments about generative AI, specifically the sometimes ridiculous answers that Google's AI prepends to search results. But first, more on the game....
Rather than the girls, what does appeal to me about the game is the strategy, which involves a lot of probabilistic reasoning—right up my alley, given the love of statistics that led me to become a data scientist. Like with any gacha game, there's a "meta" strategy of deciding how to parcel out scarce (for players who don't "whale" by spending hundreds or thousands of dollars) resources to "pull" for the assets needed to play the game (horse girls and "support" cards in this case): because it's a gacha game, you never know exactly what you're going to get when you pull, but you play the odds, and work with the information you have about probability distributions, while prioritizing the most useful targets.
Unlike the average gacha, Umamusume also features a deep tactical dimension of decision-making in the daily "grind" required to build up your team of horse girls. Each day, you guide at least one trainee through a "career", making training decisions and picking skills that, if you're lucky, will give her the stats and abilities she'll need for a place on your team. Sometimes you're working with explicit probabilities (like the probability that a skill will activate if its activation trigger occurs), and other times you're working with much vaguer contingencies (like the chance that the trigger condition will occur in the first place during an actual race), but in either case you're using probabilistic reasoning to make decisions, and yay, that's fun for a person like me (YMMV).
And though horse girls are not my cup of tea, I do find many of the characters quite appealing (shout-out here to King Halo, Nice Nature, and Narita Taishin), and I have to say that their background stories are well-written. That brings me to the Google search that inspired this post. To wit, I was watching the video story for an uma named Mejiro Dober when I encountered this:
If you're like me, you're wondering, "What's this 'Bell' business?" It's not an obvious nickname for a racer, like "Tiger" or "Beast" or even "Twinkle Toes", but its origin isn't explained within the story—or, for that matter, in any of the official Umamusume lore. So naturally I turn to Google and ask, "Why is Mejiro Dober called Bell?" and I get something like what you see below:
Note that this is the best of three answers Google's AI produced for me at different times: the first time, right after the character came out on the global server (which is ~3 years behind the original Japanese server), the AI flat-out insisted to me that Mejiro Dober is not in fact called Bell, while the answer above (and another from a few hours earlier) at least hedge the response by acknowledging uncertainty, though both of the latter answers are still categorically wrong, in that the nickname does come from an official source. The AI also doesn't explicitly acknowledge that "Bell" does actually show up in search results (note the two circled hits below the AI summary), focusing instead only on the origin of the name.
Now you may be thinking, "Hey, Scott, you asked the wrong question: you asked why she was called 'Bell', not whether she was called 'Bell'." Yeah, that's true, because I already knew she was called that, and just wanted to know why. In all fairness, the "why" question is tougher, and in the first paragraph of the summary the AI quite rightly notes that it can't find an explanation. My problem is more with the second paragraph. But just for jollies, here's the AI's answer when I asked whether she's called Bell:
Still wrong.
Presumably, the AI is using retrieval-augmented generation (RAG), meaning it's not just spitting out something retrieved from the model's own training (like what would happen if you asked ChatGPT), but rather doing a web search and then summarizing the results. Kudos to Google for following RAG best practice by linking to the top sources next to or underneath the AI summary, but in both screenshots the websites the AI links are actually the two commonly consulted fan-made wikis, not official sources (and the embedded video in the lefthand one is actually an ad on the wiki site for an interview with the cast of Stranger Things, so not at all helpful). The wikis are reasonably authoritative, and reproduce art and text from the official website, which might confuse a well-meaning AI model, but they aren't official.
The obvious reason that Google's AI doesn't link to the official sources is the Umamusume webpage for Dober contains all of one short paragraph of information, and beyond that, to get official info, you'd have to look at the game's social media accounts, the background story videos (helpfully posted on YouTube), and the anime associated with the game (also on YouTube, but not entirely canonical, as the anime characters are often somewhat different from their game counterparts). I'm not sure how if it all Google's AI consumes these sources, but it's entirely doable with modern multimodal large language models (LLM's) to extract the text from videos (they're subtitled!) and index it along with everything else Google stores from the web; and yes, you can do the same with actual voice, which Google already does by generating subtitles with AI. Given the amount of information currently stored only in video format, you'd think the company would be doing that already. If the Google AI isn't looking at these more exotic sources (and maybe it's not, because "Bell" appears several times in the background story videos), then it's deceptive to state it couldn't find anything in official sources, since it's not actually looking at most of them in the first place.
As you might have guessed, the Dober/Bell mess is not the only questionable result about Umamusume that I've seen from Google's AI. Once, it even gave me advice on when to use skills during a race (not useful, because skill activation is automated and rule-based, not under the player's control), and I wish I'd screenshotted that one. That instance was uniquely bad, but here's a more typical response. I asked what the best skills are for Late Surgers (one of four racing styles, each defined by where they run relative to the rest of the pack for the majority of a race):
A few of these recommendations are good, and a few are highly questionable (though they probably represent something some ill-informed player posted somewhere), but I'd like to focus on the ones that are flat-out wrong.
Most obviously, the skills circled in red can't actually be used by Late Surgers. Speed Star works only for Pace Chasers (another style, which runs closer to the front); Reeling in the Big One (Seuin Sky's unique skill, which can be inherited by other uma) technically can work on any runner, but to trigger it, you have to be ahead on a corner late in the race, and so it's usually used only by Front Runners.
The errors circled in blue are more subtle. Let's start with Uma Stan and Ramp Up: they're actually completely different skills, with entirely different trigger conditions, but neither of one of them is likely to trigger in the late race (Ramp Up must trigger mid-race, and Uma Stan, because it can trigger any time a runner is close to 3 other runners, tends to trigger in the early race). Furious Feat and Position Pilfer seem to be presented as if they're different versions of the same skill (by way of comparison, above that line, you'll see On Your Left!, which is the premium, or "gold", version of Slick Surge, and the same is true of Rising Dragon and Outer Swell), but Position Pilfer is actually the non-premium ("white") version of Fast & Furious, which sounds a lot like Furious Feat, but isn't the same thing at all. Notably, while Position Pilfer and Fast & Furious are restricted to Late Surgers, Furious Feat, though readily usable by Late Surgers (it works on anyone in the back half of the pack) is restricted to Mile-distance races. The conflation of these skills explains the weird reference to "Mile and other distances".
In short, the advice provided by the AI here is effectively useless: there are some sound suggestions, but you need to know Umamusume pretty well to pick the wheat from the chaff, and anyone who knew the game that well wouldn't be asking this question in the first place (or would be asking for far more detailed answers on each skill, considering pros and cons).
It's true that generative AI can excel at so-called "zero-shot" tasks, constructing new things (like a list of good Late Surger skills) by assembling information using well-established rules and relationships. But performing this kind of "transfer-learning" task successfully requires that the model discriminate between what things it can transfer from one domain to another, and what things it can't. That works in domains like politics and economics and even real-life horse-racing, where as a model is trained it can extract those rules and relationships from billions of words of text. However, it tends to fall apart in a highly specialized domain about which people have written comparatively little, especially if it's a general-purpose model (like Google's AI), in which case it might try to transfer rules and relationships it really shouldn't. This is how we get the answer I didn't think to screenshot, treating Umamusume as if it were a game that allows players to make decisions during a race (which is how most racing games work). And it's how we get a response like the one in the screenshot above, where the AI can't figure out the rules well enough to plug the nuggets of info it's pulled from the web into the right places.
The common thread between both this and the Bell problem is that the AI model just doesn't have enough information on Umamusume to work with. For a human, there's more than enough information available to figure out the game, but training a generative AI uses a brute-force approach that requires lots and lots and lots of info to learn patterns that humans can pick up with a few minutes of light reading.
Oh, in case you're still wondering why Mejiro Dober is called "Bell", I did finally figure that out. I had a hunch it was one of those things that make more sense in the original language than in translation, so I consulted Google Translate, and sure enough, turns out the Japanese word for bell is "beru", while Dober's name in Japanese is actually (ignoring the subtleties of proper transliteration) "Mejiro Doberu", because "Doberu" is short for "Doberuman", the Japanese version of "Doberman"—all the Mejiro Farm foals that year were named after breeds of dog. The pun would be so obvious to a Japanese-speaker that it likely would rarely be commented on, making it hard for a RAG AI to find references to it even if the AI were pulling from Japanese as well as English sources. An AI language model might be able to recognize and reproduce word play, but figuring out that an English nickname derives from word play in another language appears to be beyond this AI's capabilities.



