NickupAI commited on
Commit
5ff2e7f
·
verified ·
1 Parent(s): c9d45a7

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +179 -3
README.md CHANGED
@@ -1,3 +1,179 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ru
5
+ - en
6
+ tags:
7
+ - reinforcement-learning
8
+ - ppo
9
+ - network
10
+ - privacy
11
+ - censorship-circumvention
12
+ - vless
13
+ - research
14
+ pipeline_tag: reinforcement-learning
15
+ library_name: pytorch
16
+ ---
17
+
18
+ # AlphaBypass.3 🧠
19
+ <a href="https://ibb.co/m5Gx6tGx"><img src="https://i.ibb.co/MksXMpsX/logo.png" alt="logo"></a>
20
+ > *"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."*
21
+
22
+ ## What is this?
23
+
24
+ AlphaBypass is a **PPO-based reinforcement learning agent** trained to automatically discover optimal [VLESS+REALITY](https://github.com/XTLS/Xray-core) proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems.
25
+
26
+ Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work.
27
+
28
+ **This is a research project** studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :)
29
+
30
+ ---
31
+
32
+ ## Model Details
33
+
34
+ | Property | Value |
35
+ |----------|-------|
36
+ | Architecture | MLP, 3×512 hidden layers with LayerNorm |
37
+ | Parameters | ~787K |
38
+ | Algorithm | PPO (Proximal Policy Optimization) |
39
+ | Action space | Mixed discrete + continuous |
40
+ | Observation space | 75-dimensional vector |
41
+ | Training episodes | ~1,100 |
42
+ | Target protocol | VLESS + REALITY (xray-core) |
43
+ | Success rate | **93%** |
44
+ | Avg reward | +0.81 (scale: −1.0 to +1.0) |
45
+
46
+ ---
47
+
48
+ ## Reward Function
49
+
50
+ ```python
51
+ def compute_reward(metrics, baseline_mbps=32.0):
52
+ if not metrics.connected:
53
+ return -1.0
54
+
55
+ r = 0.50 * connection_quality(metrics) # ping, loss, connect time
56
+ r += 0.35 * metrics.stability_ratio # probe success rate
57
+ r += 0.15 * log_speed_score(metrics, baseline_mbps)
58
+ return r
59
+ ```
60
+
61
+ ---
62
+
63
+ ## Usage
64
+
65
+ Requires [xray-core](https://github.com/XTLS/Xray-core).
66
+
67
+ ### Load and query the model
68
+
69
+ ```python
70
+ import torch
71
+ import numpy as np
72
+ from agent import PolicyNetwork
73
+ from environment import decode_action
74
+
75
+ policy = PolicyNetwork()
76
+ ck = torch.load("best.pt", map_location="cpu", weights_only=False)
77
+ policy.load_state_dict(ck["policy_state"])
78
+ policy.eval()
79
+
80
+ obs = torch.zeros(1, 75)
81
+ with torch.no_grad():
82
+ logits, mu, _, _ = policy(obs)
83
+
84
+ discrete = np.array([l.argmax().item() for l in logits])
85
+ continuous = mu.squeeze().numpy()
86
+ config = decode_action(discrete, continuous)
87
+
88
+ print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}")
89
+ print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}")
90
+ ```
91
+
92
+ ### Server config example
93
+
94
+ ```json
95
+ {
96
+ "inbounds": [{
97
+ "port": 443,
98
+ "protocol": "vless",
99
+ "settings": {
100
+ "clients": [{"id": "YOUR-UUID-HERE", "flow": ""}],
101
+ "decryption": "none"
102
+ },
103
+ "streamSettings": {
104
+ "network": "grpc",
105
+ "security": "reality",
106
+ "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
107
+ "realitySettings": {
108
+ "dest": "YOUR-SNI-DOMAIN:443",
109
+ "serverNames": ["YOUR-SNI-DOMAIN"],
110
+ "privateKey": "YOUR-PRIVATE-KEY",
111
+ "shortIds": ["YOUR-SHORT-ID"]
112
+ }
113
+ }
114
+ }],
115
+ "outbounds": [{"tag": "direct", "protocol": "freedom"}]
116
+ }
117
+ ```
118
+
119
+ ### Client config example
120
+
121
+ ```json
122
+ {
123
+ "inbounds": [{
124
+ "port": 10808,
125
+ "protocol": "socks",
126
+ "settings": {"auth": "noauth", "udp": true}
127
+ }],
128
+ "outbounds": [{
129
+ "protocol": "vless",
130
+ "settings": {
131
+ "vnext": [{
132
+ "address": "YOUR-SERVER-IP",
133
+ "port": 443,
134
+ "users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}]
135
+ }]
136
+ },
137
+ "streamSettings": {
138
+ "network": "grpc",
139
+ "security": "reality",
140
+ "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
141
+ "realitySettings": {
142
+ "fingerprint": "safari",
143
+ "serverName": "YOUR-SNI-DOMAIN",
144
+ "publicKey": "YOUR-PUBLIC-KEY",
145
+ "shortId": "YOUR-SHORT-ID"
146
+ }
147
+ }
148
+ }]
149
+ }
150
+ ```
151
+
152
+ ---
153
+
154
+ ## Limitations
155
+
156
+ - DPI behavior varies by provider and region - results may differ.
157
+ - REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness.
158
+ - No memory between deployments - unaware of overnight DPI updates.
159
+ - 787K parameters is intentional. The problem doesn't need GPT-6.
160
+
161
+ ---
162
+
163
+ ## Citation
164
+
165
+ ```bibtex
166
+ @misc{alphabypass2026,
167
+ title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion},
168
+ year = {2026},
169
+ url = {https://huggingface.co/YOUR_USERNAME/AlphaBypass}
170
+ }
171
+ ```
172
+
173
+ ---
174
+
175
+ ## License
176
+
177
+ MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime.
178
+
179
+ *"It's not about hiding. It's about the right to reach the open internet."*