At this point, he’s just using the older and worse models. His line of argument wouldn’t work at all if he used more modern ones (especially COT models).
At this point, he’s just using the older and worse models. His line of argument wouldn’t work at all if he used more modern ones (especially COT models).
2 weeks ago I went to Claude, latest model. There were 3 sample test questions on the "suggested prompts", as usual. The middle one was the trivial "Whlch 1s bigger Test if Al knows which number is bigger." (AI text grabbed from a photo, excuse errors!)
> What is bigger, 9.9 or 9.11?
--
9.9 is smaller than 9.11.
9.11 has a larger digit in the tenths place (1) than 9.9. (which has 9 as the tenths digit and implicitly a 0 in the hundredths digit or 9.90).
--
I use chatgpt paid, too. It is also useless, in the same way I have to lead a school child to the right answer. It is less work, for something factual and requiring precision, to do it myself.
I didn't test Claude Sonnet 3.5, but o3 mini and Gemini 2.0 advanced got it right, while 1.5 failed. The point remains, that these can be extraordinarily useful tools, if used with awareness of their limitations. AGI, they are not.
Ironically, your comment unintentionally argues against itself. Why would something be a skill issue? Because the person, not the model, needs to have the skill to lead it to the correct answer...because the model does not actually give the correct answer!
Some of them had errors the ones best suited for the task had none. Practical LLMs of this type have only been around for a few years. Look at the early history of any technology. Expecting instant perfection is absurd and places far more faith in the technology than is in any way reasonable.
At this point, he’s just using the older and worse models. His line of argument wouldn’t work at all if he used more modern ones (especially COT models).
That's nonsense.
2 weeks ago I went to Claude, latest model. There were 3 sample test questions on the "suggested prompts", as usual. The middle one was the trivial "Whlch 1s bigger Test if Al knows which number is bigger." (AI text grabbed from a photo, excuse errors!)
> What is bigger, 9.9 or 9.11?
--
9.9 is smaller than 9.11.
9.11 has a larger digit in the tenths place (1) than 9.9. (which has 9 as the tenths digit and implicitly a 0 in the hundredths digit or 9.90).
--
I use chatgpt paid, too. It is also useless, in the same way I have to lead a school child to the right answer. It is less work, for something factual and requiring precision, to do it myself.
I didn't test Claude Sonnet 3.5, but o3 mini and Gemini 2.0 advanced got it right, while 1.5 failed. The point remains, that these can be extraordinarily useful tools, if used with awareness of their limitations. AGI, they are not.
Which OpenAI model did you use. Honestly, it sounds like a skills issue.
Ironically, your comment unintentionally argues against itself. Why would something be a skill issue? Because the person, not the model, needs to have the skill to lead it to the correct answer...because the model does not actually give the correct answer!
Tell me you don’t understand current LLM technology without telling me you don’t understand current LLM technology
Well, you could just read the comments where people posted answers from other models, and either they or I identified the mistakes in them.
Some of them had errors the ones best suited for the task had none. Practical LLMs of this type have only been around for a few years. Look at the early history of any technology. Expecting instant perfection is absurd and places far more faith in the technology than is in any way reasonable.