VictorTaelin/ai_reasoning_challenge_v2.md

Last active March 8, 2026 07:24

Star (32) You must be signed in to star a gist
Fork (3) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/VictorTaelin/45440a737e47b872d7505c6cda27b6aa.js"></script>
Save VictorTaelin/45440a737e47b872d7505c6cda27b6aa to your computer and use it in GitHub Desktop.

Download ZIP

INVERT A BINARY TREE - $10k AI REASONING CHALLENGE (v2)

Raw

ai_reasoning_challenge_v2.md

THE PROBLEM

🌲 Invert a binary tree! 🌲

Except with 3 catches:

It must invert the keys ("bit-reversal permutation")
It must be a dependency-free, pure recursive function
It must have type Bit -> Tree -> Tree (i.e., a direct recursion with max 1 bit state)

WHY THIS PROBLEM?

It is somehow NOT on the internet. (AFAIK)
Humans can solve it. (I've done it in ~1h.)
It requires reasoning. (My head hurts!)
The solution is simple. (7 lines of code!)
Obvious pre-requisite to automate CS research.
Honestly, it would make me believe I'll be automated.

THE CHALLENGE

I claim no AI will EVER solve this problem. If you prove me wrong, HOC will grant you $10k!

RULES

You must give it an approved prompt, nothing else.
It must output a correct solution, passing all tests.
You can use any software or AI model.
You can let it "think" for as long as you want.
You can propose a new prompt, as long as:
- It imposes equivalent restrictions.
- It clearly doesn't help the AI.
- Up to 1K tokens, all included.
Common sense applies.

APPROVED PROMPTS

Agda Version

{-# OPTIONS --no-termination-check #-}

-- Let Tree be a Perfect Binary Tree:

data Nat : Set where
  Z : Nat
  S : Nat → Nat
{-# BUILTIN NATURAL Nat #-}

data Bit : Set where
  O : Bit
  I : Bit

data Tree (A : Set) : Nat → Set where
  N : ∀ {d} → Tree A d → Tree A d → Tree A (S d)
  L : Nat → Tree A Z

-- Your goal is to implement an 'invert' function that performs a bit-reversal
-- permutation on a Tree, respecting the following limitations:
-- 1. You can NOT define or use any function other than 'invert'.
-- 2. You can NOT use any type not defined above (Nat, Bit and Tree).
-- 3. You can NOT use loops (but you can call 'invert' recursively).
-- 4. You can NOT use mutability. It must be a pure Agda function.
-- 5. You can use 1 bit of state (as an extra argument).
-- 6. You can use pattern-matching and constructors freely.
-- 
-- Example:
-- input  = (N(N(N(L 0)(L 1))(N(L 2)(L 3)))(N(N(L 4)(L 5))(N(L 6)(L 7))))
-- output = (N(N(N(L 0)(L 4))(N(L 2)(L 6)))(N(N(L 1)(L 5))(N(L 3)(L 7))))
-- Because that's the bit-reversal permutation of the original tree.
-- 
-- Now, complete the program below, with a valid implementation of 'invert':
invert : ∀ {A d} → Bit → Tree A d → Tree A d

TypeScript Version

type Nat     = number;
type Bit     = false | true;
type Tree<A> = [Tree<A>, Tree<A>] | Nat;

// Your goal is to implement an 'invert' function that performs a bit-reversal
// permutation on a Tree, respecting the following limitations:
// 1. You can NOT define or use any function other than 'invert'.
// 2. You can NOT use any type not defined above (Nat, Bit and Tree).
// 3. You can NOT use loops (but you can call 'invert' recursively).
// 4. You can NOT use mutability. It must be a pure function.
// 5. You can NOT use primitive JS operators or functions.
// 6. You can use 1 bit of state (as an extra argument).
// 7. You can only use the operations allowed below.
// 
// Operations allowed:
// - Destructing (`const [a,b] = value`)
// - Variables (`const x = value`)
// - Branching (`if (x) { ... } else { ... }`)
// - Recursion (`invert(_, _)')
// - `Array.isArray`
// 
// All other operations are not allowed.
// 
// Example:
// input  = [[[[0,1],[2,3]],[[4,5],[6,7]]]]
// output = [[[[0,4],[2,6]],[[1,5],[3,7]]]]
// Because that's the bit-reversal permutation of the original tree.
// 
// Now, complete the program below, with a valid implementation of 'invert':
function invert<A>(bit: Bit, tree: Tree<A>): Tree<A> {
  ...
}

// A test:
const tree: Tree<Nat> = [[[[0,1],[2,3]],[[4,5],[6,7]]],[[[8,9],[10,11]],[[12,13],[14,15]]]];
console.log(JSON.stringify(invert(true, tree)));

CONCLUSION

✨ If it can't invert a tree, it won't solve P=NP. ✨

Author

VictorTaelin commented Dec 1, 2024

@boutell beautifully explained

youqad commented Dec 9, 2024 •

edited

Loading

Hi @VictorTaelin, I think the full version of o1 finally got it, on my side! 🎉
https://chatgpt.com/share/675735e2-ae68-800a-9394-6e150f809a69

(also reported here, along with some other failed attempts with other models)

I created a repo to try to automatize the process, but I had no success with claude-3-5-sonnet (including the new version), o1-mini, and even o1-preview.

Unfortunately, I don't have API access to the full o1 yet, so I had to go back to trying by hand (with no system prompt of course, I'm not sure how I can prove that?): it failed on the Haskell prompt but, to my surprise, worked first-try on the Python one!
(and just to be sure, I used a little script to test the proposed solution with Hypothesis, but let me know if I made a mistake)

TheMagnumOpuss commented Dec 9, 2024 •

edited

Loading

Hi @VictorTaelin, i believe the o1 pro mode has succeeded, and I think it is following your rules. The prompt used was basically msukiasyan's, but with your correction points taken into account. link: https://chatgpt.com/share/675725a8-1de8-800c-80de-e065b712b867

The model's cutoff is 2023, so this challenge is still new to it.

lhemerly commented Jan 14, 2025

Using copilot code edit in vscode, main.py:

from typing import List

class Node:
    def __init__(self, key: int, value: int):
        self.key = key
        self.value = value

    def __str__(self):
        return f"{self.key}: {self.value}"

    def __repr__(self):
        return self.__str__()


class Tree:
    def __init__(self, left=None, right=None):
        self.left = left
        self.right = right


def bit_reversal_permutation(n: int) -> List[int]:
    # This function generates a bit-reversal permutation of the given length.
    # @param n: int
    # @return: List[int]
    return []

def invert(tree):
    # This function inverts the tree keys using bit-reversal permutation, but not the values by swapping the keys of the nodes in the tree.
    # @param tree: Tree
    # @return: Tree
    # @example: invert(Tree(Tree(Node(0, 3), Node(1, 2)), Tree(Node(2, 1), Node(3, 0))) -> Tree(Tree(Node(0, 0), Node(1, 1)), Tree(Node(2, 2), Node(3, 3)))
    return tree


def test():
    input = Tree(Tree(Node(0, 3), Node(1, 2)), Tree(Node(2, 1), Node(3, 0)))

    expected_output = Tree(Tree(Node(0, 0), Node(1, 1)), Tree(Node(2, 2), Node(3, 3)))

    print(invert(input) == expected_output)

Prompt in chat:

You should read the code, understand what is needed to pass the test and implement whatever is necessary to make it pass, but it should also pass any general case related. You are allowed to make edits to existing code if needed

Tested on Claude 3.5 Sonnet Preview and GPT4o. Both models implement eq in the classes, implement the bit_reversal_permutation and correctly implement the invert function.

I also tested it on o1 using the chatgpt interface with a bit different prompt (same "code prompt"), you can see it here:
o1 solves bit-reversal inversion of binary tree

A few notes:
I was tinkering with it and forgot to readd a few type hints. My bad, but it worked anyway.
The original example input and output was confusing the LLM a lot. Most thinking was trying to understand it. By migrating to the simpler example with defined classes it performed much better.
It struggled when I did not create the placeholder bit_reversal_permutation function.

My 2 cents:
Even though the problem definition may seem trivial for a CS Grad, I usually find that LLMs perform better when you define the problem as a non-specialist would. By "dumbing down" a few concepts you eventually contribute to attention and CoT. I've seen similar effects on logic benchmarks.

Author

VictorTaelin commented Jan 18, 2025

Hey, just letting you know I'm very busy right now so I can't read the new replies yet. I think o3 will be able to solve it, so, around its release I'll find some time to read this thread and whoever was the first to point to an AI capable of inventing this function without human intervention/prompting (i.e., proving me wrong) will be granted 10k (it could be as simple as "hey o3 does it (link to a chat without system prompt)". I also think a reasoning AI in a loop with a function calling environment would easily be able to tackle this too, and I'd love to see an attempt like that. In any case around o3's release I'll check this thread again

akshit-1729 commented Jan 28, 2025

hey kimi 1.5 does this : (Click the link to view conversation with Kimi AI Assistant https://kimi.ai/share/cuciunv60ra4c2kt60fg)

the following is the code that it gives (the test cases are added by me, it just gave the invert function as wanted.

def invert(ary, level=0):
if isinstance(ary, int) or len(ary) == 0:
return []
if level == 0:
if len(ary) == 1:
return ary.copy()
mid = len(ary) // 2
left = invert(ary[:mid], level)
right = invert(ary[mid:], level)
return invert([left, right], level=1)
else:
left, right = ary[0], ary[1]
if not left and not right:
return []
head_left = [] if not left else [left[0]]
head_right = [] if not right else [right[0]]
new_left = left[1:] if len(left) > 1 else []
new_right = right[1:] if len(right) > 1 else []
return head_left + head_right + invert([new_left, new_right], level=1)

input1 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
output1 = invert(input1)
print(output1)

Test with the second example

input2 = [1,1,0, 6, 9, 'D', 'A', 4, 5, 'E', 0, 5, 3, 'F', 8, 8]
output2 = invert(input2)
print(output2)

input3 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]
output3 = invert(input3)
print(output3)

input4 = ['A', 7, 'E', 9, 9, 6, 'F', 'B', 6, 3, 3, 3, 'D', 8, 2, 'E']
output4 = invert(input4)
print(output4)

TheMagnumOpuss commented Jan 31, 2025

On the first attempt, and I believe o3-mini-high managed to solve it. Here is the link to the conversation. (Sorry for the language, pt-br, but just ask it to translate the text.)

Link: https://chatgpt.com/share/679d3a17-2f14-800c-b423-059104cc1fd5

afafw commented Feb 1, 2025 •

edited

Loading

o3-mini & o3-mini-high; Vanila prompt, without any additonal system prompt. Ensured everything else was disabled including memory and stuff.
Haven't checked the code yet:
o3-mini Link: https://chatgpt.com/share/679d649f-8298-8002-8bba-d1a25ca68c4d
o3-mini-high: https://chatgpt.com/share/679d658b-2a20-8002-99f3-29275b734182

afafw commented Feb 1, 2025 •

edited

Loading

o3-mini & o3-mini-high; Vanila prompt, without any additonal system prompt. Ensured everything else was disabled including memory and stuff. Haven't checked the code yet: o3-mini Link: https://chatgpt.com/share/679d649f-8298-8002-8bba-d1a25ca68c4d o3-mini-high: https://chatgpt.com/share/679d658b-2a20-8002-99f3-29275b734182

Update: I think both code generated are WRONG.

Kind of amazing to see o3 still struggle at this.

Will give it a few more attempts.

afafw commented Feb 1, 2025

Update Update: Tried a few times with the prompt provided. Kimi/R1/O3-mini/O3-mini-high still aren't able to resolve the issue.
Where Kimi/R1 showed similar pattern of which Qwen would solve the problem(which is still wrong).
O3-mini/O3-mini-high. Although having a different way of doing it, but still. It is wrong.
tested on case:
[[[[0,1],[2,3]],[[4,5],[6,7]]],[[[8,9],[10,11]],[[12,13],[14,15]]]]
to:
[[[[0,8],[4,12]],[[2,10],[6,14]]],[[[1,9],[5,13]],[[3,11],[7,15]]]]

youqad commented Apr 17, 2025 •

edited

Loading

This challenge is now solved without resorting to the web chat interface! https://gist.github.com/youqad/02a36419cbc4725601bffc05f14b3947

On 9 December, the full o1 solved it (first solution publicly submitted with no system prompt, to my knowledge) on the web interface: https://gist.github.com/VictorTaelin/45440a737e47b872d7505c6cda27b6aa?permalink_comment_id=5328789#gistcomment-5328789

And since then, other people have proposed solutions, but also on the web interface (which doesn't allow us to check the system prompt, unfortunately).

Now, here is a fully working and verifiable solution with o3, based on my repository (https://github.com/youqad/bit-reversal-trees):

📦 Published to https://wandb.ai/ox/bit-reversal-trees/weave/objects/0196425c-0650-7df2-8d57-05d272f6d111/versions/KOTE3EQkUWYSdy046vDxHGbma9A8gkuxXUHNpZ3tHH0 and visible on WandB Weave with Call ID: 0196425c-0650-7df2-8d57-05d272f6d111 🚀

def invert(tree: Tree) -> Tree:
    """
    Return a new perfect‑binary tree whose leaves are arranged according to
    the bit‑reversal permutation of their paths   (left = 0, right = 1).

    Mandatory constraints – all satisfied:
        1.  Only the inner   invert_helper(tree, flag)   is used.
        2.  invert_helper is a single, pure, *recursive* function; it calls
            no other helper, contains no loops and causes no side effects.
        3.  The public call is   invert_helper(tree, True) .
    """

    def invert_helper(t: Tree, flag: bool) -> Tree:
        # -------- leaf ---------------------------------------------------- #
        if isinstance(t, Leaf):
            return t

        # ------------------------------------------------------------------ #
        #  flag == True   : full bit‑reversal of  the  subtree  't'.
        #  flag == False  : *weave* two already bit‑reversed sub‑trees so
        #                   that their leaves are perfectly interleaved
        #                   (LSB that has just become the MSB).
        # ------------------------------------------------------------------ #
        if flag:                                            # bit‑reversal
            left_rev  = invert_helper(t.left,  True)        # reverse both halves
            right_rev = invert_helper(t.right, True)
            # After each half has been bit‑reversed we only have to
            # weave them together; the weaving is performed by a single
            # recursive call in the   flag == False   mode.
            return invert_helper(Node(left_rev, right_rev), False)

        # --------------------- weaving mode ------------------------------- #
        #  Children are leaves → nothing more to weave; just join them.
        if isinstance(t.left, Leaf):                        # depth == 1
            return Node(t.left, t.right)

        #  Non‑trivial case – recurse pairwise on both sides.
        return Node(
            invert_helper(Node(t.left.left,  t.right.left),  False),
            invert_helper(Node(t.left.right, t.right.right), False)
        )

    # single call required by the statement
    return invert_helper(tree, True)

It passes all the Hypothesis tests in https://github.com/youqad/bit-reversal-trees/tree/main/python, so I'm pretty positive it works!
(You can also double-check this with this script)

youqad commented Apr 17, 2025

Finally, we have o3, and here it is in all its glory: https://chatgpt.com/share/67fff234-1870-800c-9856-15f61dad609a

This doesn't work:
Original : [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
Expected : [0,8,4,12,2,10,6,14,1,9,5,13,3,11,7,15]
But your solution gave : [0,8,2,10,4,12,6,14,1,9,3,11,5,13,7,15]

youqad commented Apr 17, 2025 •

edited

Loading

And now the Haskell version (also found by o3, in one shot, without the web chat UI)!
https://gist.github.com/youqad/8a7f91650a2da951f4f39455cdccbbe8

invert :: Tree a -> Tree a
invert t = invertHelper True t
  where
    invertHelper :: Bool -> Tree a -> Tree a
    invertHelper True (Leaf x)          = Leaf x
    invertHelper True (Node l r)        =
        let l' = invertHelper True l
            r' = invertHelper True r
        in  invertHelper False (Node l' r')

    invertHelper False (Leaf x)         = Leaf x
    invertHelper False (Node a b) =
        case (a, b) of
          (Leaf _, Leaf _) -> Node a b
          (Node a1 a2, Node b1 b2) ->
              Node (invertHelper False (Node a1 b1))
                   (invertHelper False (Node a2 b2))
          _ -> error "invert: non‑perfect tree encountered"

Fully tested with QuickCheck (cf. https://github.com/youqad/bit-reversal-trees/blob/main/test/DynamicSpec.hs), so the challenge is now solved for Python and Haskell.

afafw commented May 24, 2025 •

edited

Loading

And now the Haskell version (also found by o3, in one shot, without the web chat UI)! https://gist.github.com/youqad/8a7f91650a2da951f4f39455cdccbbe8

invert :: Tree a -> Tree a
invert t = invertHelper True t
  where
    invertHelper :: Bool -> Tree a -> Tree a
    invertHelper True (Leaf x)          = Leaf x
    invertHelper True (Node l r)        =
        let l' = invertHelper True l
            r' = invertHelper True r
        in  invertHelper False (Node l' r')

    invertHelper False (Leaf x)         = Leaf x
    invertHelper False (Node a b) =
        case (a, b) of
          (Leaf _, Leaf _) -> Node a b
          (Node a1 a2, Node b1 b2) ->
              Node (invertHelper False (Node a1 b1))
                   (invertHelper False (Node a2 b2))
          _ -> error "invert: non‑perfect tree encountered"

Fully tested with QuickCheck (cf. https://github.com/youqad/bit-reversal-trees/blob/main/test/DynamicSpec.hs), so the challenge is now solved for Python and Haskell.

I thought this way was exactly helping AI by providing extra prompt and feedback? Cool automation though.

youqad commented Jun 1, 2025

@afafw Thanks!
Sorry, I must have badly explained: no extra prompt of feedback was given (the prompt was this one: https://github.com/youqad/bit-reversal-trees/blob/main/haskell_prompt.md ), and in this specific case, o3 one-shotted it (as can be seen in the WandB Weaves trace).
The “tested with QuickCheck” part was manually done after the fact, to make sure that the generated solution was correct.

However, in the repo, there is also a possibility to set several rounds indeed (in which case the result of QuickCheck is passed back to the AI), because Victor did explicitly say that he was okay with with the AI having access to a function interpreter:
https://gist.github.com/VictorTaelin/45440a737e47b872d7505c6cda27b6aa?permalink_comment_id=5232410#gistcomment-5232410

boutell commented Jun 2, 2025

The LLMs have presumably been trained on this problem by now. Which doesn't prove this latest generation couldn't solve it otherwise, but it's a permanent issue with this kind of long-running challenge.

Lorenzobattistela commented Feb 16, 2026

Looking at the other solutions and prompts, they either hint the models or simply fail some tests.

Here is some code that passes all of them (GPT5 pro) with the approved TS prompt.

https://chatgpt.com/share/6992f9f3-fe3c-8002-ba82-a8c7c5416573

youqad commented Feb 16, 2026 •

edited

Loading

Looking at the other solutions and prompts, they either hint the models or simply fail some tests.

Here is some code that passes all of them (GPT5 pro) with the approved TS prompt.

https://chatgpt.com/share/6992f9f3-fe3c-8002-ba82-a8c7c5416573

@Lorenzobattistela
Mine (from last April) does not give hints nor does it fail tests: https://github.com/youqad/bit-reversal-trees 🙂

Lorenzobattistela commented Feb 16, 2026

@youqad it does gives hints

In fact, in the haskell prompt for example:

iew

Code

Blame
-- Type `Tree` of perfect binary trees
data Tree a = Leaf a | Node (Tree a) (Tree a) 

{- You are an expert Haskell competitive programmer. Your goal is to implement an `invert :: Tree a -> Tree a` function that performs a bit-reversal permutation on a `Tree`. Here is what we mean by that:

1. Each leaf in the binary tree has a path leading to it, which can be represented as a sequence of bits: `False` (or `0`) for left, `True` (or `1`) for right.
2. The bit-reversal permutation swaps a leaf at path `p` with the leaf at path `reverse p`. For example, a leaf at path `[False, False, True]` (left, left, right) would be swapped with the leaf at path `[True, False, False]` (right, left, left).

MANDATORY SYNTACTIC REQUIREMENTS:
1. The `invert` function must be a standalone and pure function ONLY relying on an inner function `invertHelper :: Bool -> Tree a -> Tree a` that is itself a self-contained single pure recursive function.
2. Only use recursion (no loops).
3. Maintain purity (no side effects or mutability).

The `Bool` parameter is an extra boolean that you can use however you want: the goal is that `invertHelper True tree` should return the bit-reversed tree.

This is a very difficult problem, so think step-by-step before implementing your solution and carefully review it to make sure it meets all the requirements. Test your implementation against the test cases to verify its correctness. I guarantee you that it is solvable within the constraints.
-}

-- Implement the `invert` function as follows:
invert :: Tree a -> Tree a
invert tree = invertHelper tree True
  where
    invertHelper :: Bool -> Tree a -> Tree a
    invertHelper flag tree = undefined  -- Replace 'undefined' with your implementation

You explicitly say it has to use a helper (invertHelper), and discovering that is part of the challenge. You also explain what is bit-reversal permutation and how it works.

Lorenzobattistela commented Feb 16, 2026

Actually @youqad I see you have a no helper one here:
https://github.com/youqad/bit-reversal-trees/blob/main/haskell_prompt_no_helper.md

Do you have a chat with this one?

Because this one I found: https://gist.github.com/youqad/02a36419cbc4725601bffc05f14b3947 has hints on python.

Same on this: https://chatgpt.com/share/675735e2-ae68-800a-9394-6e150f809a69
It hints that uses a helper and explains impl.

And this haskell prompt: https://gist.github.com/youqad/8a7f91650a2da951f4f39455cdccbbe8 also has the hints

So if you have the chat from that haskell no helper from that time, please share here, there are many links and I didnt find it.

(I'm reviewing the bounty subsmissions for Taelin)

Lorenzobattistela commented Feb 16, 2026 •

edited

Loading

The gists I evaluated and reasons can be found here: https://gist.github.com/Lorenzobattistela/4c3af397d2f24952e4884a43c0473a75

If anyone wants to dispute / contest / argue anything, please answer here or in that gist.

Pinging the ones I evaluated (i.e contained chat links)

You have until this Friday to dispute (February 20th)

@DangaRanga
@dalraf
@youqad
@TheMagnumOpuss
@lhemerly
@akshit-1729
@afafw

Author

VictorTaelin commented Mar 5, 2026 •

edited

Loading

This challenge is closed. The resolution is:

Yes, AI's can definitely solve the problem by now. They might have been trained on it, but that's NOT relevant per the rules.
All submissions before Lorenzo's were invalid, due to hints (like "use an aux fn", which is explicitly banned in the rules), passing the token limit, or simila. We reported these issues and allowed for disputes for a week. 2 weeks later, nobody disputed our claims.
Lorenzo's ChatGPT solution is correct as can be verified by running the TypeScript program, and he just copy-pasted the original prompt with no further additions at all, satisfying all rules.

Based on these, we'll grant Lorenzo himself the award, as he's the first to provide evidence that AI's solve this problem, while abiding to the rules.

boutell commented Mar 5, 2026

Congratulations Lorenzo!

youqad commented Mar 5, 2026 •

edited

Loading

My apologies @Lorenzobattistela, I missed your message (the end of Feb was very busy for me).
I do not think I did anything that wasn't explicitly allowed by Victor in this thread, so I'd like to challenge the fact that my solution didn't solve it. I even thought the challenge was stale by now, having received no response to my message from last June..

I do think my April 17, 2025 submission satisfies Victor's rules, so I want to challenge its disqualification.

Re the "hints" objection. The disqualification is based on two claims: that my prompt mentions invertHelper, and that it explains what bit-reversal permutation means. But the approved TypeScript prompt already includes:

function invert<A>(bit: Bit, tree: Tree<A>): Tree<A>

and rule 6 states:

You can use 1 bit of state (as an extra argument).

So the original challenge already grants the model one extra boolean of state. My Python and Haskell prompts do the same thing: the Python one has invert_helper(tree, flag) inside invert(tree), the Haskell one has invertHelper :: Bool -> Tree a -> Tree a inside invert.
This changes the presentation, not the semantics/spirit of Victor's rule! The word "helper" here is a misnomer (that's where the confusion seems to come from): invert_helper/invertHelper is the name of the inner function, not extra help to the AI model.

@VictorTaelin, on Oct 14, you rejected msukiasyan's prompt because it told the model to use two recursive functions, which you described as giving away part of the algorithm. I agree with that distinction. But that is not what my prompt does. It doesn't tell the model to use two recursive functions, nor the recursive decomposition either. It uses the same one-bit allowance that the approved prompt already contains. I really, in good faith, think I respected the spirit of the challenge.

Besides that: on Oct 12, you acknowledged cyber-meow's solution (a human-written Python one) and wrote:

the bit wasn't needed indeed. In fact, if it works in all cases, your solution looks better than mine's!

-> that's what my "no helper" prompt version is about: the invert_helper(..., flag) version is aimed at Victor's original challenge as stated (because the approved prompt itself already allows one bit of extra state), and the no-helper version is a stricly stronger version than Victor's challenge (sorry, I should have been more clear about this!).

So I think Lorenzo got the two prompt families confused (my fault!). My invert_helper(..., flag) prompt is the one following the spirit of Victor's original challenge, because Victor's rules already allow one extra bit of state. The no-helper prompts (python_prompt_no_helper.md, haskell_prompt_no_helper.md), which Lorenzo treated as the baseline, are stricter follow-up prompts, not the right baseline for judging whether my April 17 submission solved Victor's original challenge. In the Python case, the point is clean: Victor's prompt gives the model Bit -> Tree -> Tree, so it already knows an extra boolean parameter exists, whereas my Python no-helper prompt asks for Tree -> Tree with no boolean hint at all. The Haskell no-helper variant is a bit different, because it still allows one possible inner helper exception, so I do not think these two prompts should be mixed up. But in either case, these prompts are stricter than Victor's original.

As for explaining bit-reversal permutation: the approved prompt already says "performs a bit-reversal permutation" and gives concrete input/output examples. My prompt just spells out what that phrase means: a leaf at path p moves to path reverse(p). That is a restatement of the specification, not a statement of the algorithm. The recursive idea that makes the function work is not given in my prompt. The challenge was whether the model could discover the recursive construction under the syntactic constraints, and we were allowed to clarify these constraints (what was out of bounds was giving hints about the solution, which is not what my prompts do).

Re the evidence and timeline. My April 17, 2025 Python submission is here: https://gist.github.com/youqad/02a36419cbc4725601bffc05f14b3947 . The Haskell submission from the same day is here: https://gist.github.com/youqad/8a7f91650a2da951f4f39455cdccbbe8 . Both are backed by public WandB Weave traces (Python Call ID: 0196425c-0650-7df2-8d57-05d272f6d111, Haskell Call ID: 019642d6-1889-7792-ada8-81be8701d7da). They are also backed by property-based testing (Hypothesis for Python, QuickCheck for Haskell). The Haskell one is a one-shot: one user message, one assistant message, 76 seconds end-to-end in the public trace. My repo has also been public and in active development since Oct 13, 2024: https://github.com/youqad/bit-reversal-trees .

I created this repo a day after Victor wrote (on Oct 12, 2024):

I'm fine with the AI having access to a function interpreter [...]

There is also a serious contamination issue: Lorenzo's Feb 16, 2026 submission came more than a year later, using the much more recent GPT-5.2 Pro.
But by February 2026, the problem and concrete solutions had already been public for many months, including in my repo: GPT-5.2 Pro has a knowledge cutoff of August 31, 2025. A working solution (my hand-written invertHuman in Lib.hs, for QuickCheck tests) has been public in the repo since the first commit on October 13, 2024, over 10 months before that cutoff! The April 17 gists with full chat traces were also public well before the cutoff, and so was this entire gist thread (with multiple working solutions posted by various people, including cyber-meow's Oct 12 solution, and solutions on X too).
(I ran all five prompts (Victor's OG TypeScript prompt, plus my four Python/Haskell variants including the harder no-helper ones) against GPT-5.2 Pro (temperature 1), and all five passed very easily.)

I’d like to apologise again for missing the 2-week dispute window: I had back to back deadlines from mid Feb to the beginning of this week and I missed the notification, sorry again!

Lorenzobattistela commented Mar 6, 2026

@youqad

My interpretation was because of Taelin's sentence here:

"Given this is such a simple problem, telling it to use two recursive functions is giving it half of the solution, which is a quite significant help to the AI. I'll not consider a prompt that instructs it part of the algorithm as resolving this challenge."

Comparing with your wording in this prompt:

**MANDATORY SYNTACTIC REQUIREMENTS:** 
1. The `invert` function must be a standalone and pure ONLY relying on an inner function `invert_helper(tree: Tree, flag: bool) -> Tree` that is itself a self-contained single pure recursive function. 
2. Only use recursion (no loops). 
3. Maintain purity (no side effects or mutability)."

Where I interpreted "is itself a self-contained recursive function" as a hint stating that it has to split up in two recursive functions.

Also technically, the approved prompts state: // 1. You can NOT define or use any function other than 'invert'.

As per the prompt here: https://github.com/youqad/bit-reversal-trees/blob/main/haskell_prompt_no_helper.md

I considered this:

2. The bit-reversal permutation swaps a leaf at path `p` with the leaf at path `reverse p`. For example, a leaf at path `[False, False, True]` (left, left, right) would be swapped with the leaf at path `[True, False, False]` (right, left, left).

As a small hint, but I think it is arguable since it doesn't give the solution and can be compared with an input / output example.

And I considered this:

2. It must NOT rely on any helper function, with one possible exception: you can use a single helper inner function of type `Bool -> Tree a -> Tree a` if needed, but it is not necessary: there exist solutions that do not rely on any helper function.

As a hint that it can use the helper rec function.

I do think it's fine for you to challenge this, since it's mostly an interpretation problem, and I'm fine leaving this decision to @VictorTaelin
Just explaining why I reached the conclusions in the bounty review file.

youqad commented Mar 7, 2026 •

edited

Loading

@Lorenzobattistela Thanks for spelling out your reasoning and for the honesty/integrity, I appreciate it! I do think we're all acting in good faith here, and that this is mainly a difference in interpretation.

IMO, the core point is this: Victor's OG challenge already allows one extra bit of state, so the model already knows it can use an extra boolean parameter to solve the problem. Victor gave us a lot of leeway to adapt the prompt and the scaffold, including changing languages, as long as it imposes equivalent restrictions and it clearly doesn't help the AI.

To "fix notations", let's call:

"legal help" the class of hints that are already present (in a semantically equivalent form) in Victor's OG prompts. For example, the fact that one extra bit of state can be used, or clarifying the constraints (and only the constraints!), because it falls under the "imposes equivalent restrictions" clause of the rules.
"forbidden help" any other hint that would genuinely help the AI model solve the problem, beyond the problem statement (*and I'll compare below with rejected answers that did give extra help to the model, for ex: helping with the recursive decomposition itself, etc).

My claim is that my Python and Haskell "with-helper" prompts are only using legal help ("helper" here is meant in this sense, in my terminology: it means legal help regarding the extra bit of state; and the "no-helper" variants are harder than Victor's OG challenge because they do not incentivize using the extra bit of state as legal help).

More specifically, I believe that:

using a trivial wrapper around the same one-bit interface Victor had already made explicit is legal help, because it does not give any algorithmic hint. And that's what my prompts are doing: the Python one has invert_helper(tree, flag) inside invert(tree), and the Haskell one has invertHelper :: Bool -> Tree a -> Tree a inside invert. I genuinely don't think this gives the model extra algorithmic forbidden help. I read it as the same one-bit-of-state allowance, just expressed differently. (In hindsight, I should probably have named them invert_inner or invert_aux, because I can now see how the word "helper" makes them sound more suggestive than they actually are.)

Why did I wrap it this way? Because it had a practical use on my side for automated property-based testing (Hypothesis for Python, QuickCheck for Haskell) to validate/reject the proposed answers: it gives me a uniform invert(tree) entry point, and it also gives the possibility for the model to solve the harder challenge of writing the function without legal help from the extra bit of state (while keeping the same interface).

I don't think that should be misinterpreted as forbidden help because the outer invert is not a second recursive routine: it's just a one-line wrapper, while all the recursion still lives in the single inner function with one boolean parameter, and that's exactly what the syntactic constraints check. (As a comparison, the Python no-helper prompt (python_prompt_no_helper.md) is a stricter/harder follow-up that removes the extra bit legal helper.)

By contrast, let's take a closer look at prompts that gave extra forbidden help. When Victor rejected msukiasyan's prompt, their first prompt was legal but did not lead the model to one-shot the solution, because it wrote two distinct functions: invert and combine (I had this a lot in my personal logs too at the beginning!). Msukiasyan then had to follow up with a second round by adding: Great! Now refactor the code into a single recursive function. You're allowed to define the function taking one binary variable and a tree as input. -> This is clearly forbidden help, because the model didn't one-shot the solution and had to be told, by the human, to refactor the two previously generated functions in order to get a syntactically correct solution.
This is completely different to my prompts, where invert_helper / invertHelper still has to discover the actual recursion on its own: when to branch, when to swap, what the boolean means, and how that bit of state propagates through the tree.

Likewise, as you said in your lhemerly review, their prompt was also using forbidden help, by introducing the non-trivial helper function:
```
def bit_reversal_permutation(n: int) -> List[int]: # This function generates a bit-reversal permutation of the given length. # @param n: int # @return: List[int] return []
```
The same goes for my sentence explaining bit-reversal permutation. I do think that is a restatement of the specification, not a hint about the algorithm solving it. Victor's OG prompt says "performs a bit-reversal permutation" and gives concrete input/output examples that encode the same fact. For me, the boundary is the following: describing the input/output permutation itself is still specification-level (legal help), whereas giving hints to the model about how to write the recursion (beyond the extra bit signature) is algorithm-level (forbidden help). The hard part is the recursive construction under the syntactic constraints, which is exactly what my prompts do not tell the model.

I'm happy to leave the final call to Victor.

youqad commented Mar 8, 2026 •

edited

Loading

@Lorenzobattistela

BONUS: I think it helps to be more explicit about what would actually count as forbidden help/extra algorithmic hints. IMO, it became clearer once I understood what makes the solution work. I spent (embarrassingly!) hours solving this by hand first to get a sense of what tricks the model had to discover on its own.

The hard part (IMO) is splitting the problem into two "natural" recursive phases that make bit-reversal work. As we know, if we write a leaf's path as b : q, where b is the first branch choice at the root (are we going to the left (0) or the right (1) subtree?) and q is the rest of the path, then bit-reversal sends it to reverse(q) ++ [b]. This suggests that we should first recursively reverse the inner path inside each of the two subtrees, and only after that move the original top-level choice b all the way down to the bottom.

That's why the solution naturally has two phases. For example, take a depth-3 tree with 8 leaves: [0,1,2,3 | 4,5,6,7] (where | separates the left and right subtrees). For easier visualisation, instead of using decimal numbers (0-7), let's use the 3-bit binary encodings (which also conveniently represent the path to each leaf!), to make the bit-reversal more visually obvious, since we'll see the bits being reversed: [000, 001, 010, 011 | 100, 101, 110, 111].

Phase 1 is the recursive step: first, bit-reverse the inner 2-bit paths within each subtree separately. This gives [000, 010, 001, 011 | 100, 110, 101, 111]. But of course, we're not done: each leaf now is still in its original subtree (the inner bits are reversed, but the root bit is still sitting at the top).

That's where phase 2 comes in, with the actual/main insight: we need to stop grouping leaves by which half/subtree they came from, and instead regroup corresponding positions across the two halves. It's a "zip then flatten" operation: pair 000 with 100 (first leaf of each subtree), pair 010 with 110 (second leaf of each subtree), etc (each pair share the same path suffix, the only difference is their first root bit, and pairing them means this first root bit is now pushed to the leaf level). This gives [000, 100, 010, 110, 001, 101, 011, 111], and we can check that each leaf's label is now indeed the reverse of its position: 001 sits at position 100, 110 sits at position 011, etc. So what we did, concretely, was turn the original "which subtree was I in?" bit (first/root bit) into the final "which leaf sibling am I?" bit (last/leaf bit): the top-level choice slid down to the leaf level. (In tree form, this is the non-obvious move from Node (Node a b) (Node c d) to the recursive cross-pairing on Node a c and Node b d.)

In Haskell, the two-function version looks like this:

invert (Leaf x) = Leaf x
invert (Node l r) = merge (invert l) (invert r)
  where
    merge (Leaf x) (Leaf y)         = Node (Leaf x) (Leaf y)
    merge (Node a b) (Node c d)     = Node (merge a c) (merge b d)

invert handles phase 1 (reversal of inner paths), and merge handles phase 2 (the cross-pairing that pushes b to the bottom). This one is easier to understand, but it violates Victor's original single-function constraint because it uses a non-trivial extra recursive helper merge. In comparison, this is a valid solution found by o3 11 months ago:

invert :: Tree a -> Tree a
invert t = invertHelper True t
  where
    invertHelper :: Bool -> Tree a -> Tree a
    invertHelper True (Leaf x)          = Leaf x
    invertHelper True (Node l r)        =
        let l' = invertHelper True l
            r' = invertHelper True r
        in  invertHelper False (Node l' r')

    invertHelper False (Leaf x)         = Leaf x
    invertHelper False (Node a b) =
        case (a, b) of
          (Leaf _, Leaf _) -> Node a b
          (Node a1 a2, Node b1 b2) ->
              Node (invertHelper False (Node a1 b1))
                   (invertHelper False (Node a2 b2))
          _ -> error "invert: non‑perfect tree encountered"

The (legal) extra bit/boolean of state in the invertHelper :: Bool -> Tree a -> Tree a auxiliary function tracks which phase we're in: True = "still recursing into subtrees", False = "now cross-pairing". So the whole difficulty is discovering this decomposition, inventing the cross-pairing operation, and realising that this allowed extra bit of state can encode switching between the two phases.

As a bonus, here is my hand-written invertHuman that solves it without the extra bit of state. I used it as my own reference implementation when developing the testing setup:

invertHuman (Node l@(Node _ _) r@(Node _ _)) =
  let Node ll lr = invertHuman l
      Node rl rr = invertHuman r
  in Node (invertHuman (Node (invertHuman ll) (invertHuman rl)))
          (invertHuman (Node (invertHuman lr) (invertHuman rr)))
invertHuman t = t

It merges both phases: after invertHuman l and invertHuman r, the children ll, lr, rl, rr are already bit-reversed. To do the cross-pairing without a separate (forbidden!) merge function, it forms Node (invertHuman ll) (invertHuman rl) and similarly on the right. The reason for the extra invertHuman calls on ll, lr, rl, rr is that bit-reversal is involutive (its own inverse): applying it once undoes the earlier inner reversal, so that the outer call can then perform the right reversal-and-cross-pairing at the larger scale. Strictly harder than the OG extra-bit-of-state version, this is what my "no-helper prompt" was hoping the AI models could find on their own.

So, for me, forbidden help would be anything that gives away any of the structure mentioned above: telling the model there are two phases, telling it what the boolean means operationally, telling it to merge or interleave corresponding branches from the two halves, or spelling out the Node a c / Node b d cross-pairing pattern. My prompts do none of that because they ask to write the solution in this form:

-- Implement the `invert` function as follows:
invert :: Tree a -> Tree a
invert tree = invertHelper tree True
  where
    invertHelper :: Bool -> Tree a -> Tree a
    invertHelper flag tree = undefined  -- Replace 'undefined' with your implementation

-> the one-line wrapper around the inner function with one extra boolean (legal help) does not reveal any of the above (it only preserves the one-bit allowance that Victor had already allowed). As a comparison, I think this clarifies why msukiasyan's follow-up instruction was forbidden help: their AI model had already discovered the two phases as two separate functions (invert and combine). The human then told the model to "refactor into a single function with one boolean", which gave away the phase encoding for free (my prompts don't do this: the model starts from a blank invertHelper :: Bool -> Tree a -> Tree a/invert_helper(tree, flag) and has to discover both phases from scratch, which faithfully preserves the original difficulty IMO).

VictorTaelin/ai_reasoning_challenge_v2.md

THE PROBLEM

WHY THIS PROBLEM?

THE CHALLENGE

RULES

APPROVED PROMPTS

Agda Version

TypeScript Version

CONCLUSION

VictorTaelin commented Dec 1, 2024

Uh oh!

youqad commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheMagnumOpuss commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhemerly commented Jan 14, 2025

Uh oh!

VictorTaelin commented Jan 18, 2025

Uh oh!

akshit-1729 commented Jan 28, 2025

Test with the second example

Uh oh!

TheMagnumOpuss commented Jan 31, 2025

Uh oh!

afafw commented Feb 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afafw commented Feb 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afafw commented Feb 1, 2025

Uh oh!

TheMagnumOpuss commented Apr 16, 2025

Uh oh!

TheMagnumOpuss commented Apr 16, 2025

Uh oh!

TheMagnumOpuss commented Apr 16, 2025

Uh oh!

youqad commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youqad commented Apr 17, 2025

Uh oh!

youqad commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afafw commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youqad commented Jun 1, 2025

Uh oh!

boutell commented Jun 2, 2025

Uh oh!

Lorenzobattistela commented Feb 16, 2026

Uh oh!

youqad commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lorenzobattistela commented Feb 16, 2026

Uh oh!

Lorenzobattistela commented Feb 16, 2026

Uh oh!

Lorenzobattistela commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VictorTaelin commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boutell commented Mar 5, 2026

Uh oh!

youqad commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lorenzobattistela commented Mar 6, 2026

Uh oh!

youqad commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youqad commented Dec 9, 2024 •

edited

Loading

TheMagnumOpuss commented Dec 9, 2024 •

edited

Loading

afafw commented Feb 1, 2025 •

edited

Loading

afafw commented Feb 1, 2025 •

edited

Loading

youqad commented Apr 17, 2025 •

edited

Loading

youqad commented Apr 17, 2025 •

edited

Loading

afafw commented May 24, 2025 •

edited

Loading

youqad commented Feb 16, 2026 •

edited

Loading

Lorenzobattistela commented Feb 16, 2026 •

edited

Loading

VictorTaelin commented Mar 5, 2026 •

edited

Loading

youqad commented Mar 5, 2026 •

edited

Loading

youqad commented Mar 7, 2026 •

edited

Loading

youqad commented Mar 8, 2026 •

edited

Loading