AlphaApple - FruitBox DQN
This model plays the FruitBox (Fruit Box) puzzle game hosted on Gamesaien. It predicts Q-values over all axis-aligned rectangles on a 10x17 board. A valid action is a rectangle whose cell sum is exactly 10, so you must apply an action mask to filter invalid rectangles before selecting the best move.
Quick facts
- Board: 10x17, values 0-9 (0 means empty)
- Action space: 8415 axis-aligned rectangles
- Input: one-hot board with shape
[1, 10, 10, 17] - Output: Q-values for all rectangles
- Masking: required to remove invalid rectangles
Files in this repo
model.pth: PyTorch checkpoint dict withpolicy_net,target_net,optimizermodel.onnx: Exported ONNX model for browser/runtime inference
How to use (PyTorch)
# Model definition is in https://github.com/kbsooo/AlphaApple (src/models.py)
import torch
from src.models import FruitBoxDQN
rows, cols = 10, 17
n_actions = 55 * 153 # (rows*(rows+1)/2) * (cols*(cols+1)/2) = 8415
model = FruitBoxDQN(rows, cols, n_actions)
ckpt = torch.load("model.pth", map_location="cpu")
state = ckpt["policy_net"] if "policy_net" in ckpt else ckpt
model.load_state_dict(state)
model.eval()
How to use (ONNX / browser)
const session = await ort.InferenceSession.create("model.onnx");
// input: Float32Array with shape [1, 10, 10, 17]
const output = await session.run({ input });
// output.output.data: Q-values for 8415 rectangles
Action masking (required)
You must mask invalid rectangles before selecting an action. A rectangle is valid if the sum of its cells equals 10. Without the mask, the model can pick illegal moves.
Training details
- Environment: FruitBoxEnv (implemented in
envs/fruitbox_env.py, classFruitBoxEnvImproved) - Board generator: BackwardBoardGenerator (solvable boards)
- Curriculum: target coverage ramps from 0.3 to 0.95 in steps of 0.1
- Optimizer: Adam, gamma=0.99, lr=1e-4
- Episodes: 10k (Colab integrated script)
Limitations
- Trained on generated boards; performance may vary on edge cases.
- Requires an accurate action mask and correct board extraction.