Another test of logical ability for LLMs?

laca_komputilulo@alien.top · 2 years ago

Another test of logical ability for LLMs?

laca_komputilulo@alien.top · 2 years ago

This is a valid critique about the form of the riddle.

Most riddles rely on out of context prior knowledge to be used as a part of a deductive chain of reasoning. This one is not any different from the question about how many sisters one has that folks in this community use all the time.

Try same q with badminton instead of chess. Then same with singles tennis (which 3.5 answers as the sixth brother was playing doubles tennis :)…

I hope this thread wont descend into deliberation on whether it is possible to play the battleship game alone and how much fun it is :)

Be-Kind_Always-Learn@alien.top · 2 years ago

Most riddles rely on out of context prior knowledge to be used as a part of a deductive chain of reasoning. This one is not any different from the question about how many sisters one has that folks in this community use all the time.

Sure, but they do their best to avoid gaps that make the riddle unsolvable. A riddle like “a girl has as many brothers as sisters, but each brother has half as many brothers as sisters, how many sisters does she have?” has exactly one correct answer.

But the gap in this one is just big enough it’s a problem. Like you said, replacing chess with a mandatory two-person experience is much better! (Though still open-ended, because there’s no implication they are alone.) The other commenter changed the question to “where are they”, which is also a good improvement!

I hope this thread wont descend into deliberation on whether it is possible to play the battleship game alone and how much fun it is :)

Anything to stop the losing streak!

laca_komputilulo@alien.top · 2 years ago

As usual, “the beauty is in the eye of the beholder”.
I think part of the point for these tests is to be able to solve these logical puzzles given all of the richness and ambiguity of NLs. We’ve had deterministic theorem solvers capable of solving these problems expressed as a closed set for decades.

That said, please see the capstone version of the prompt in the second update, which removes most of the ambiguity per the points you raised. It also removes the ‘singles’ aspect of tennis, which consistently trips up in-context reasoning, making the weaker LLMs think its a solo activity (despite an explicit following clarification).