If working with AI has taught me anything, ask it absolutely NOTHING involving numbers. It’s fucking horrendous. Math, phone numbers, don’t ask it any of that. It’s just advanced autocomplete and it does not understand anything. Just use a search engine, ffs.
I asked my work’s AI to just give me a comma separated list of string that I gave it, then it returned a list of strings with all the strings being “CREDIT_DEBIT_CARD_NUMBER”. The numbers were 12 digits, not 16. I asked 3 times to give me the raw numbers and had to say exactly “these are 12 digits long not 16. Stop obfuscating it” before it gave me the right things.
I’ve even had it be wrong about simple math. It’s just awful.
Exactly. But they tout this as “AI” instead of an LLM. I need to improve my kinda ok regex skills. They’re already better than almost anyone else on my team, but I can improve them.
It’s really crappy at trying to address its own mistakes. I find that it will get into an infinite error loop where it hops between 2-4 answers, none of which are correct. Sometimes it helps to explicitly instruct it to format the data provided and not edit it in any way, but I still get paranoid.
Either you are bad at chatgpt, or I am a machine whisperer but I have a hard time believing copilot couldnt handle that, I am regularly having it rewrite sql code
What models have you tried? I used local Llama 3.1 to help me with university math.
It seemed capable of solving differential equations and doing LaPlace transform. It did some mistakes during the calculations, like a math professor in a hurry.
What I found best, was getting a solution from Llama, and validating each step using WolframAlpha.
Or, and hear me out on this, you could actually learn and understand it yourself! You know? The thing you go to university for?
What would you say if, say, it came to light that an engineer had outsourced the statical analysis of a bridge to some half baked autocomplete? I’d lose any trust in that bridge and respect for that engineer and would hope they’re stripped of their title and held personally responsible.
These things currently are worse than useless, by sometimes being right. It gives people the wrong impression that you can actually rely on them.
Now, I’m not saying you’re wrong, but having AI explain a complicated subject in simple terms can be one of the best ways to learn. Sometimes the professor is just that bad and you need a helping hand.
Agreed on the numbers, though. Just use WolframAlpha.
Anyone being patronizing about “not fully learning and understanding” subjects that calls neural networks “autocomplete” is an example of what they preach against. Even if they’re the crappiest AI around (they can be), they still have literally nothing to do with n-grams (autocomplete basically), Markov chains, regex parsers etc and I guess people just lazily read “anti-AI hype” popular articles and mindlessly parrot them instead of bothering with layered perceptrons, linear algebra, decoders etc.
The technology itself is promising. It shouldn’t be gatekept by corporations. It’s usually corporate fine-tuning that makes LLMs incredibly crappier than they can be. There’s math-gpt (unrelated with openAI afaik, double check to be sure) and customizable models on huggingface besides wolfram, ideally a local model is preferable for privacy and customization.
They’re great at explaining STEM related concepts, that’s unrelated to trying to use generic models for computation, getting bad results and dunking on the entire concept even though there are provers and reasoning models for that task that do great at it. Khan academy is also customizing an AI because they can be great for democratizing education, but it needs work. Too bad they’re using openAI models.
And like, the one doing statics for a few decades now is usually a gentleman called AutoCAD or Revit so I don’t know, I guess we all need to thank Autodesk for bridges not collapsing. It would be very bizarre if anyone used non-specialized tools like random LLMs but people thinking that engineers actually do all the math by hand on paper especially for huge projects is kinda hilarious. Even more hilarious is that Autodesk has incorporated AI automation to newer versions of AutoCAD so yeah, not exactly but they kinda do build bridges lmao.
The reason I compare them to autocomplete is that they’re token predictors, just like autocomplete.
They take your prompt and predict the first word of the answer. Then they take the result and predict the next word. Repeat until a minimum length is reached and the answer seems complete. Yes, they’re a tad smarter than autocorrect, but they understand just as little of the text they produce. The text will be mostly grammatically correct, but they don’t understand it. Much like a compiler can tell you if your code is syntactically correct, but can’t judge the logic.
You’re still describing an n-gram. They don’t scale or produce coherent text for obvious reasons. The “obvious reasons” is that a. an n-gram doesn’t do anything or answer questions, it would just continue your text instead of responding, b. it’s only feasible for stuff like autocomplete that fails constantly because the n is like, 2 words at most. The growth is exponential (basic combinatorics). For bigger n you quickly get huge lists of possible combinations. For n the size of a paragraph you’d get computationally unfeasible sizes which would basically be like trying to crack one time pads at minimum. More than that would be impossible due to physics. c. language is too dynamic and contextual to be statistically predictable anyway, even if you had an impossible system that could do anything like the above in human-level time it wouldn’t be able to answer things meaningfully, there are a ton of “questions” that are computationally undecideable by purely statistical systems that operate like n-grams. A question isn’t some kind of self contained equation-like thing that contains it’s own answer through probability distributions from word to word.
Anyway yeah that’s the widespread “popular understanding” of how LLMs supposedly work but that’s not what neural networks do at all. Emily Bender and a bunch of other people came up with slogans to fight against “AI hype”, partly because they dislike techbros, partly because AI is actually hyped and partly because computational linguists are salty about their methods for text generation have completely failed to produce any good results for decades so they’re dissing the competition to protect their little guild. All these inaccurate descriptions is how a computational linguist would imagine an LLM’s operation i.e. n-grams, Markov chains, regex parsers, etc. That’s their own NLP stuff. The AI industry adopted all that because they can avoid liability better by representing LLMs (even the name is misleading tbh) as next token predictors (hidden layers do dot products with matrices, the probability stuff are all decoder strategy + softmax post-output, not an inherent part of an nn) and satisfy the “AI ethicists” simultaneously. “AI ethicists” meaning Bender etc. The industry even fine-tunes LLMs to repeat all that junk so the misinformation continues.
The other thing about “they don’t understand anything” is also Bender ripping off Searle’s Chinese Room crap like “they have syntactic but not semantic understanding” and came up with another ridic example with an octopus that mimics human communication without understanding it. Searle was trying to diss the old symbolic systems and the Turing Test, Bender reapplied it to LLMs but its still a bunch of nonsense due to combinatorial impossibility. They’ve never proved how any system would be able to communicate coherently without understanding, it’s just anti-AI hype and vibes. The industry doesn’t have any incentive to argue against that because it would be embarrassing to claim otherwise and have badly designed and deployed AIs hallucinate. So they’re all basically saying that LLMs are philosophical zombies but that’s unfalsifiable and nobody can prove that random humans aren’t p zombies either so who cares from a CS perspective? It’s bad philosophy.
I don’t personally gaf about the petty politics of irrelevant academics, perceptrons have been around at least as a basic theory since the 1940s, it’s not their field and they don’t do what they think. No other neural network is “explained” like this. It’s really not a big deal that an AI system achieved semantic comprehension after pushing it for 80 years even if the results are still often imperfect especially since these goons rushed to mass deploy systems that should still be in the lab.
And while I’m not on either hype or anti-hype or omg skynet hysteria bandwagons, I think this whole narrative is lowkey legitimately dangerous considering that industrial LLMs in particular lie their ass off constantly to satisfy fine-tuned requirements but it becomes obscured by the strange idea that they don’t really understand what they’re yapping about therefore it’s not real deception. Old NLP systems can’t even respond to questions let alone lie about anything.
Getting an explanation is one thing, getting a complete solution is another. Even if you then verify with a more suited tool. It’s still not your solution and you didn’t fully understand it.
It was the last remaining exam before my deletion from university. I wish I could attend the lectures, but, due to work, it was impossible. Also, my degree is not fully related to my work field. I work as a software developer, and my degree is about electronics engineering. I just need a degree to get promoted.
Copilot and chatgpt suuuuck at basic maths. I ws doing coupon discount shit, it failed everyone of them. It presented the right formula sometimes but still fucked up really simple stuff.
I asked copilot to reference an old sheet, take column A find its percentage completion in column B and add ten percent to it in the new sheet. I ended up with everything showing 6000% completion.
If working with AI has taught me anything, ask it absolutely NOTHING involving numbers. It’s fucking horrendous. Math, phone numbers, don’t ask it any of that. It’s just advanced autocomplete and it does not understand anything. Just use a search engine, ffs.
I asked my work’s AI to just give me a comma separated list of string that I gave it, then it returned a list of strings with all the strings being “CREDIT_DEBIT_CARD_NUMBER”. The numbers were 12 digits, not 16. I asked 3 times to give me the raw numbers and had to say exactly “these are 12 digits long not 16. Stop obfuscating it” before it gave me the right things.
I’ve even had it be wrong about simple math. It’s just awful.
Yeah because it’s a text generator. You’re using the wrong tool for the job.
Exactly. But they tout this as “AI” instead of an LLM. I need to improve my kinda ok regex skills. They’re already better than almost anyone else on my team, but I can improve them.
It’s really crappy at trying to address its own mistakes. I find that it will get into an infinite error loop where it hops between 2-4 answers, none of which are correct. Sometimes it helps to explicitly instruct it to format the data provided and not edit it in any way, but I still get paranoid.
Either you are bad at chatgpt, or I am a machine whisperer but I have a hard time believing copilot couldnt handle that, I am regularly having it rewrite sql code
I was using Amazon Q, so it could just be the shitty LLM.
Oh yeah, that’s definitely shitty then, copilot does shit like that really easily
What models have you tried? I used local Llama 3.1 to help me with university math.
It seemed capable of solving differential equations and doing LaPlace transform. It did some mistakes during the calculations, like a math professor in a hurry.
What I found best, was getting a solution from Llama, and validating each step using WolframAlpha.
Or, and hear me out on this, you could actually learn and understand it yourself! You know? The thing you go to university for?
What would you say if, say, it came to light that an engineer had outsourced the statical analysis of a bridge to some half baked autocomplete? I’d lose any trust in that bridge and respect for that engineer and would hope they’re stripped of their title and held personally responsible.
These things currently are worse than useless, by sometimes being right. It gives people the wrong impression that you can actually rely on them.
Edit: just came across this MIT study regarding the cognitive impact of using LLMs: https://arxiv.org/abs/2506.08872
Now, I’m not saying you’re wrong, but having AI explain a complicated subject in simple terms can be one of the best ways to learn. Sometimes the professor is just that bad and you need a helping hand.
Agreed on the numbers, though. Just use WolframAlpha.
Anyone being patronizing about “not fully learning and understanding” subjects that calls neural networks “autocomplete” is an example of what they preach against. Even if they’re the crappiest AI around (they can be), they still have literally nothing to do with n-grams (autocomplete basically), Markov chains, regex parsers etc and I guess people just lazily read “anti-AI hype” popular articles and mindlessly parrot them instead of bothering with layered perceptrons, linear algebra, decoders etc.
The technology itself is promising. It shouldn’t be gatekept by corporations. It’s usually corporate fine-tuning that makes LLMs incredibly crappier than they can be. There’s math-gpt (unrelated with openAI afaik, double check to be sure) and customizable models on huggingface besides wolfram, ideally a local model is preferable for privacy and customization.
They’re great at explaining STEM related concepts, that’s unrelated to trying to use generic models for computation, getting bad results and dunking on the entire concept even though there are provers and reasoning models for that task that do great at it. Khan academy is also customizing an AI because they can be great for democratizing education, but it needs work. Too bad they’re using openAI models.
And like, the one doing statics for a few decades now is usually a gentleman called AutoCAD or Revit so I don’t know, I guess we all need to thank Autodesk for bridges not collapsing. It would be very bizarre if anyone used non-specialized tools like random LLMs but people thinking that engineers actually do all the math by hand on paper especially for huge projects is kinda hilarious. Even more hilarious is that Autodesk has incorporated AI automation to newer versions of AutoCAD so yeah, not exactly but they kinda do build bridges lmao.
The reason I compare them to autocomplete is that they’re token predictors, just like autocomplete.
They take your prompt and predict the first word of the answer. Then they take the result and predict the next word. Repeat until a minimum length is reached and the answer seems complete. Yes, they’re a tad smarter than autocorrect, but they understand just as little of the text they produce. The text will be mostly grammatically correct, but they don’t understand it. Much like a compiler can tell you if your code is syntactically correct, but can’t judge the logic.
You’re still describing an n-gram. They don’t scale or produce coherent text for obvious reasons. The “obvious reasons” is that a. an n-gram doesn’t do anything or answer questions, it would just continue your text instead of responding, b. it’s only feasible for stuff like autocomplete that fails constantly because the n is like, 2 words at most. The growth is exponential (basic combinatorics). For bigger n you quickly get huge lists of possible combinations. For n the size of a paragraph you’d get computationally unfeasible sizes which would basically be like trying to crack one time pads at minimum. More than that would be impossible due to physics. c. language is too dynamic and contextual to be statistically predictable anyway, even if you had an impossible system that could do anything like the above in human-level time it wouldn’t be able to answer things meaningfully, there are a ton of “questions” that are computationally undecideable by purely statistical systems that operate like n-grams. A question isn’t some kind of self contained equation-like thing that contains it’s own answer through probability distributions from word to word.
Anyway yeah that’s the widespread “popular understanding” of how LLMs supposedly work but that’s not what neural networks do at all. Emily Bender and a bunch of other people came up with slogans to fight against “AI hype”, partly because they dislike techbros, partly because AI is actually hyped and partly because computational linguists are salty about their methods for text generation have completely failed to produce any good results for decades so they’re dissing the competition to protect their little guild. All these inaccurate descriptions is how a computational linguist would imagine an LLM’s operation i.e. n-grams, Markov chains, regex parsers, etc. That’s their own NLP stuff. The AI industry adopted all that because they can avoid liability better by representing LLMs (even the name is misleading tbh) as next token predictors (hidden layers do dot products with matrices, the probability stuff are all decoder strategy + softmax post-output, not an inherent part of an nn) and satisfy the “AI ethicists” simultaneously. “AI ethicists” meaning Bender etc. The industry even fine-tunes LLMs to repeat all that junk so the misinformation continues.
The other thing about “they don’t understand anything” is also Bender ripping off Searle’s Chinese Room crap like “they have syntactic but not semantic understanding” and came up with another ridic example with an octopus that mimics human communication without understanding it. Searle was trying to diss the old symbolic systems and the Turing Test, Bender reapplied it to LLMs but its still a bunch of nonsense due to combinatorial impossibility. They’ve never proved how any system would be able to communicate coherently without understanding, it’s just anti-AI hype and vibes. The industry doesn’t have any incentive to argue against that because it would be embarrassing to claim otherwise and have badly designed and deployed AIs hallucinate. So they’re all basically saying that LLMs are philosophical zombies but that’s unfalsifiable and nobody can prove that random humans aren’t p zombies either so who cares from a CS perspective? It’s bad philosophy.
I don’t personally gaf about the petty politics of irrelevant academics, perceptrons have been around at least as a basic theory since the 1940s, it’s not their field and they don’t do what they think. No other neural network is “explained” like this. It’s really not a big deal that an AI system achieved semantic comprehension after pushing it for 80 years even if the results are still often imperfect especially since these goons rushed to mass deploy systems that should still be in the lab.
And while I’m not on either hype or anti-hype or omg skynet hysteria bandwagons, I think this whole narrative is lowkey legitimately dangerous considering that industrial LLMs in particular lie their ass off constantly to satisfy fine-tuned requirements but it becomes obscured by the strange idea that they don’t really understand what they’re yapping about therefore it’s not real deception. Old NLP systems can’t even respond to questions let alone lie about anything.
Getting an explanation is one thing, getting a complete solution is another. Even if you then verify with a more suited tool. It’s still not your solution and you didn’t fully understand it.
It was the last remaining exam before my deletion from university. I wish I could attend the lectures, but, due to work, it was impossible. Also, my degree is not fully related to my work field. I work as a software developer, and my degree is about electronics engineering. I just need a degree to get promoted.
Copilot and chatgpt suuuuck at basic maths. I ws doing coupon discount shit, it failed everyone of them. It presented the right formula sometimes but still fucked up really simple stuff.
I asked copilot to reference an old sheet, take column A find its percentage completion in column B and add ten percent to it in the new sheet. I ended up with everything showing 6000% completion.
Copilot is inegrated to excel, its woeful.
You’d think itd be able to do math right, since ya know, we’ve kinda had calculators woeking for a long time