Alex on Marcus on AI

1 Comment

You might define the measure of capabilities of an LLM to be the number of tasks that it solves correctly. It is finite due to the length restriction of the LLM input ("context"). Many different input texts can describe the same task; hence each task is a set of essentially equivalent texts.

Of course there is no linear effect of capacity increase on benchmark scores. However given the purpose of LLM benchmarks, the relationship should be close to monotonic. Hence the approach taken by Gary Marcus in this article to show lack of recent LLM capability improvements seems valid.

Expand full comment

Like (3)

Reply

Share

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts