What's more, they exhibit a counter-intuitive scaling Restrict: their reasoning work boosts with dilemma complexity as much as some extent, then declines despite acquiring an sufficient token finances. By evaluating LRMs with their common LLM counterparts under equivalent inference compute, we detect three general performance regimes: (1) very low-complexity https://www.youtube.com/watch?v=snr3is5MTiU