NickupAI
/

alphabypass3

+---
+license: mit
+language:
+- ru
+- en
+tags:
+- reinforcement-learning
+- ppo
+- network
+- privacy
+- censorship-circumvention
+- vless
+- research
+pipeline_tag: reinforcement-learning
+library_name: pytorch
+---
+# AlphaBypass.3 🧠
+<a href="https://ibb.co/m5Gx6tGx"><img src="https://i.ibb.co/MksXMpsX/logo.png" alt="logo"></a>
+> *"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."*
+## What is this?
+AlphaBypass is a **PPO-based reinforcement learning agent** trained to automatically discover optimal [VLESS+REALITY](https://github.com/XTLS/Xray-core) proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems.
+Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work.
+**This is a research project** studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :)
+---
+## Model Details
+| Property | Value |
+|----------|-------|
+| Architecture | MLP, 3×512 hidden layers with LayerNorm |
+| Parameters | ~787K |
+| Algorithm | PPO (Proximal Policy Optimization) |
+| Action space | Mixed discrete + continuous |
+| Observation space | 75-dimensional vector |
+| Training episodes | ~1,100 |
+| Target protocol | VLESS + REALITY (xray-core) |
+| Success rate | **93%** |
+| Avg reward | +0.81 (scale: −1.0 to +1.0) |
+---
+## Reward Function
+```python
+def compute_reward(metrics, baseline_mbps=32.0):
+    if not metrics.connected:
+        return -1.0
+    r  = 0.50 * connection_quality(metrics)  # ping, loss, connect time
+    r += 0.35 * metrics.stability_ratio      # probe success rate
+    r += 0.15 * log_speed_score(metrics, baseline_mbps)
+    return r
+```
+---
+## Usage
+Requires [xray-core](https://github.com/XTLS/Xray-core).
+### Load and query the model
+```python
+import torch
+import numpy as np
+from agent import PolicyNetwork
+from environment import decode_action
+policy = PolicyNetwork()
+ck = torch.load("best.pt", map_location="cpu", weights_only=False)
+policy.load_state_dict(ck["policy_state"])
+policy.eval()
+obs = torch.zeros(1, 75)
+with torch.no_grad():
+    logits, mu, _, _ = policy(obs)
+discrete   = np.array([l.argmax().item() for l in logits])
+continuous = mu.squeeze().numpy()
+config     = decode_action(discrete, continuous)
+print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}")
+print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}")
+```
+### Server config example
+```json
+{
+  "inbounds": [{
+    "port": 443,
+    "protocol": "vless",
+    "settings": {
+      "clients": [{"id": "YOUR-UUID-HERE", "flow": ""}],
+      "decryption": "none"
+    },
+    "streamSettings": {
+      "network": "grpc",
+      "security": "reality",
+      "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
+      "realitySettings": {
+        "dest": "YOUR-SNI-DOMAIN:443",
+        "serverNames": ["YOUR-SNI-DOMAIN"],
+        "privateKey": "YOUR-PRIVATE-KEY",
+        "shortIds": ["YOUR-SHORT-ID"]
+      }
+    }
+  }],
+  "outbounds": [{"tag": "direct", "protocol": "freedom"}]
+}
+```
+### Client config example
+```json
+{
+  "inbounds": [{
+    "port": 10808,
+    "protocol": "socks",
+    "settings": {"auth": "noauth", "udp": true}
+  }],
+  "outbounds": [{
+    "protocol": "vless",
+    "settings": {
+      "vnext": [{
+        "address": "YOUR-SERVER-IP",
+        "port": 443,
+        "users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}]
+      }]
+    },
+    "streamSettings": {
+      "network": "grpc",
+      "security": "reality",
+      "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
+      "realitySettings": {
+        "fingerprint": "safari",
+        "serverName": "YOUR-SNI-DOMAIN",
+        "publicKey": "YOUR-PUBLIC-KEY",
+        "shortId": "YOUR-SHORT-ID"
+      }
+    }
+  }]
+}
+```
+---
+## Limitations
+- DPI behavior varies by provider and region - results may differ.
+- REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness.
+- No memory between deployments - unaware of overnight DPI updates.
+- 787K parameters is intentional. The problem doesn't need GPT-6.
+---
+## Citation
+```bibtex
+@misc{alphabypass2026,
+  title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion},
+  year  = {2026},
+  url   = {https://huggingface.co/YOUR_USERNAME/AlphaBypass}
+}
+```
+---
+## License
+MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime.
+*"It's not about hiding. It's about the right to reach the open internet."*