highload.html 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286
  1. <h3>Optimizing 3proxy for high load</h3>
  2. <p>Precaution 1: 3proxy was not initially developed for high load and is positioned as a SOHO product, the main reason is "one connection - one thread" model 3proxy uses. 3proxy is known to work with above 200,000 connections under proper configuration, but use it in production environment under high loads at your own risk and do not expect too much.
  3. <p>Precaution 2: This documentation is incomplete and is not sufficient. High loads may require very specific system tuning including, but not limited to specific or cusomized kernels, builds, settings, sysctls, options, etc. All this is not covered by this documentation.
  4. <h4>Configuring 'maxconn'</h4>
  5. A number of simulatineous connections per service is limited by 'maxconn' option.
  6. Default maxconn value since 3proxy 0.8 is 500. You may want to set 'maxconn'
  7. to higher value. Under this configuration:
  8. <pre>
  9. maxconn 1000
  10. proxy -p3129
  11. proxy -p3128
  12. socks
  13. </pre>
  14. maxconn for every service is 1000, and there are 3 services running
  15. (2 proxy and 1 socks), so, for all services there can be up to 3000
  16. simulatineous connections to 3proxy.
  17. <p>Avoid setting 'maxconn' to arbitrary high value, it should be carefully
  18. choosen to protect system and proxy from resources exhaution. Setting maxconn
  19. above resources available can lead to denial of service conditions.
  20. <h4>Understanding resources requirements</h4>
  21. Each running service require:
  22. <ul>
  23. <li>1*thread (process)
  24. <li>1*socket (file descriptor)
  25. <li>1 stack memory segment + some heap memory, ~64K-128K depending on the system
  26. </ul>
  27. Each connected client require:
  28. <ul>
  29. <li>1*thread (process)
  30. <li>2*socket (file descriptor). For FTP 4 sockets are required.
  31. <br>Under linux since 0.9 splice() is used. It's much more effective, but requires
  32. <br>2*socket (file descriptor) + 2*pipe (file descriptors) = 4 file descriptors.
  33. <br>For FTP 4 sockets and 2 pipes are required with splice().
  34. <br>Up to 128K (up to 256K in the case of splice()) of kernel buffers memory. This is theoretical maximum, actual numbers depend on connection quality and traffic amount.
  35. <br>1 additional socket (file descriptor) during name resolution for non-cached names
  36. <br>1 additional socket during authentication or logging for RADIUS authentication or logging.
  37. <li>1*ephemeral port (3*ephemeral ports for FTP connection).
  38. <li>1 stack memory segment of ~32K-128K depending on the system + at least 16K and up to few MB (for 'proxy' and 'ftppr') of heap memory. If you are short of memory, prefer 'socks' to 'proxy' and 'ftppr'.
  39. <li>a lot of system buffers, specially in the case of slow network connections.
  40. </ul>
  41. Also, additional resources like system buffers are required for network activity.
  42. <h4>Setting ulimits</h4>
  43. Hard and soft ulimits must be set above calculated requirements. Under Linux, you can
  44. check limits of running process with
  45. <pre>
  46. cat /proc/PID/limits
  47. </pre>
  48. where PID is a pid of the process.
  49. Validate ulimits match your expectation, especially if you run 3proxy under dedicated account
  50. by adding e.g.
  51. <pre>
  52. system "ulimit -Ha >>/tmp/3proxy.ulim.hard"
  53. system "ulimit -Sa >>/tmp/3proxy.ulim.soft"
  54. </pre>
  55. in the beginning (before first service started) and the end of config file.
  56. Make both hard restart (that is kill and start 3proxy process) and soft restart
  57. by sending SIGUSR1 to 3proxy process, check ulimits recorded to files match your
  58. expecation. In systemd based distros (e.g. latest Debian / Ubuntu) changing limits.conf
  59. is not enough, limits must be ajusted in systemd configuration, e.g. by setting
  60. <pre>
  61. DefaultLimitDATA=infinity
  62. DefaultLimitSTACK=infinity
  63. DefaultLimitCORE=infinity
  64. DefaultLimitRSS=infinity
  65. DefaultLimitNOFILE=102400
  66. DefaultLimitAS=infinity
  67. DefaultLimitNPROC=10240
  68. DefaultLimitMEMLOCK=infinity
  69. </pre>
  70. in user.conf / system.conf
  71. <h4>Extending system limitation</h4>
  72. Check manuals / documentation for your system limitations e.g. system-wide limit for number of open files
  73. (fs.file-max in Linux). You may need to change sysctls or even rebuild the kernel from source.
  74. <p>
  75. To help with socket-based system-dependant settings, since 0.9-devel 3proxy supports different
  76. socket options which can be set via -ol option for listening socket, -oc for proxy-to-client
  77. socket and -os for proxy-to-server socket. Example:
  78. <pre>
  79. proxy -olSO_REUSEADDR,SO_REUSEPORT -ocTCP_TIMESTAMPS,TCP_NODELAY -osTCP_NODELAY
  80. </pre>
  81. available options are system dependant.
  82. <h4>Using 3proxy in virtual environment</h4>
  83. If 3proxy is used in VPS environment, there can be additional limitations.
  84. For example, kernel resources / system CPU usage / IOCTLs can be limited in a different way, and this can become a bottleneck.
  85. Since 0.9 devel, 3proxy uses splice() by default on Linux, splice() prevents network traffic from being copied from
  86. kernel space to 3proxy process and generally increases throughput, epecially in the case of high volume traffic. But
  87. since some work is moved to kernel, it requires up to 2 times more kernel resources in terms of CPU, memory and IOCTLs.
  88. Use -s0 option to disable splice() usage for given service, if kernel resources are additionally limited and this
  89. limitation is a bottleneck, e.g.
  90. <pre>
  91. socks -s0
  92. </pre>
  93. <h4>Extending ephemeral port range</h4>
  94. Check ephemeral port range for your system and extend it to the number of the
  95. ports required.
  96. Ephimeral range is always limited to maximum number of ports (64K). To extend the
  97. number of outgoing connections above this limit, extending ephemeral port range
  98. is not enough, you need additional actions:
  99. <ol>
  100. <li> Configure multiple outgoing IPs
  101. <li> Make sure 3proxy is configured to use different outgoing IP by either setting
  102. external IP via RADIUS
  103. <pre>
  104. radius secret 1.2.3.4
  105. auth radius
  106. proxy
  107. </pre>
  108. or by using multiple services with different external
  109. interfaces, example:
  110. <pre>
  111. allow user1,user11,user111
  112. proxy -p1001 -e1.1.1.1
  113. flush
  114. allow user2,user22,user222
  115. proxy -p1001 -e1.1.1.2
  116. flush
  117. allow user3,user33,user333
  118. proxy -p1001 -e1.1.1.3
  119. flush
  120. allow user4,user44,user444
  121. proxy -p1001 -e1.1.1.4
  122. flush
  123. </pre>
  124. or via "parent extip" rotation,
  125. e.g.
  126. <pre>
  127. allow user1,user11,user111
  128. parent 1000 extip 1.1.1.1 0
  129. allow user2,user22,user222
  130. parent 1000 extip 1.1.1.2 0
  131. allow user3,user33,user333
  132. parent 1000 extip 1.1.1.3 0
  133. allow user4,user44,user444
  134. parent 1000 extip 1.1.1.4 0
  135. proxy
  136. </pre>
  137. or
  138. <pre>
  139. allow *
  140. parent 250 extip 1.1.1.1 0
  141. parent 250 extip 1.1.1.2 0
  142. parent 250 extip 1.1.1.3 0
  143. parent 250 extip 1.1.1.4 0
  144. socks
  145. </pre>
  146. <pre>
  147. </pre>
  148. Under latest Linux version you can also start multiple services with different
  149. external addresses on the single port with SO_REUSEPORT on listening socket to
  150. evenly distribute incoming connections between outgoing interfaces:
  151. <pre>
  152. socks -olSO_REUSEPORT -p3128 -e 1.1.1.1
  153. socks -olSO_REUSEPORT -p3128 -e 1.1.1.2
  154. socks -olSO_REUSEPORT -p3128 -e 1.1.1.3
  155. socks -olSO_REUSEPORT -p3128 -e 1.1.1.4
  156. </pre>
  157. for Web browsing last two examples are not recommended, because same client can get
  158. different external address for different requests, you should choose external
  159. interface with user-based rules instead.
  160. <li> You may need additional system dependant actions to use same port on different IPs,
  161. usually by adding SO_REUSEADDR (SO_PORT_SCALABILITY for Windows) socket option to
  162. external socket. This option can be set (since 0.9 devel) with -os option:
  163. <pre>
  164. proxy -p3128 -e1.2.3.4 -osSO_REUSEADDR
  165. </pre>
  166. Behavior for SO_REUSEADDR and SO_REUSEPORT is different between different system,
  167. even between different kernel versions and can lead to unexpected results.
  168. Specifics is described <a href="https://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t">here</a>.
  169. Use this options only if actually required and if you fully understand possible
  170. consiquences. E.g. SO_REUSEPORT can help to establish more connections than the
  171. number of the client port available, but it can also lead to situation connections
  172. are randomely fail due to ip+port pairs collision if remote or local system
  173. doesn't support this trick.
  174. </ol>
  175. <h4>Setting stacksize</h4>
  176. 'stacksize' is a size added to all stack allocations and can be both positive and
  177. negative. Stack is required in functions call. 3proxy itself doesn't require large
  178. stack, but it can be required if some
  179. purely-written libc, 3rd party libraries or system functions called. There is known\
  180. dirty code in Unix ODBC
  181. implementations, build-in DNS resolvers, especially in the case of IPv6 and large
  182. number of interfaces. Under most 64-bit system extending stacksize will lead
  183. to additional memory space usage, but do not require actual commited memory,
  184. so you can inrease stacksize to relatively large value (e.g. 1024000) without
  185. the need to add additional phisical memory,
  186. but it's system/libc dependant and requires additional testing under your
  187. installation. Don't forget about memory related ulimts.
  188. <p>For 32-bit systems address space can be a bottlneck you should consider. If
  189. you're short of address space you can try to use negative stack size.
  190. <h4>Known system issues</h4>
  191. There are known race condition issues in Linux / glibc resolver. The probability
  192. of race condition arises under configuration with IPv6, large number of interfaces
  193. or IP addresses or resolvers configured. In this case, install local recursor and
  194. use 3proxy built-in resolver (nserver / nscache / nscache6).
  195. <h4>Do not use public resolvers</h4>
  196. Public resolvers like ones from Google have ratelimits. For large number of
  197. requests install local caching recursor (ISC bind named, PowerDNS recursor, etc).
  198. <h4>Avoid large lists</h4>
  199. Currently, 3proxy is not optimized to use large ACLs, user lists, etc. All lists
  200. are processed lineary. In devel version you can use RADIUS authentication to avoid
  201. user lists and ACLs in 3proxy itself. Also, RADIUS allows to easily set outgoing IP
  202. on per-user basis or more sophisicated logics.
  203. RADIUS is a new beta feature, test it before using in production.
  204. <h4>Avoid changing configuration too often</h4>
  205. Every configuration reload requires additional resources. Do not do frequent
  206. changes, like users addition/deletaion via connfiguration, use alternative
  207. authentication methods instead, like RADIUS.
  208. <h4>Consider using 'noforce'</h4>
  209. 'force' behaviour (default) re-authenticates all connections after
  210. configuration reload, it may be resource consuming on large number of
  211. connections. Consider adding 'noforce' command before services started
  212. to prevent connections reauthentication.
  213. <h4>Do not monitor configuration files directly</h4>
  214. Using configuration file directly in 'monitor' can lead to race condition where
  215. configuration is reloaded while file is being written.
  216. To avoid race conditions:
  217. <ol>
  218. <li> Update config files only if there is no lock file
  219. <li> Create lock file then 3proxy configuration is updated, e.g. with
  220. "touch /some/path/3proxy/3proxy.lck". If you generate config files
  221. asynchronously, e.g. by user's request via web, you should consider
  222. implementing existance checking and file creation as atomic operation.
  223. <li>add
  224. <pre>
  225. system "rm /some/path/3proxy/3proxy.lck"
  226. </pre>
  227. at the end of config file to remove it after configuration is successfully loaded
  228. <li> Use a dedicated version file to monitor, e.g.
  229. <pre>
  230. monitor "/some/path/3proxy/3proxy.ver"
  231. </pre>
  232. <li> After config is updated, change version file for 3proxy to reload configuration,
  233. e.g. with "touch /some/path/3proxy/3proxy.ver".
  234. </ol>
  235. <h4>Use TCP_NODELAY to speed-up connections with small amount of data</h4>
  236. If most requests require exchange with a small amount of data in a both ways
  237. without the need for bandwidth, e.g. messengers or small web request,
  238. you can eliminate Nagle's algorithm delay with TCP_NODELAY flag. Usage example:
  239. <pre>
  240. proxy -osTCP_NODELAY -ocTCP_NODELAY
  241. </pre>
  242. sets TCP_NODELAY for client (oc) and server (os) connections.
  243. <p>Do not use TCP_NODELAY on slow connections with high delays and then
  244. connection bandwidth is a bottleneck.
  245. <h4>Use splice to speedup large data amount transfers</h4>
  246. splice() allows to copy data between connections without copying to process
  247. addres space. It can speedup proxy on high bandwidth connections, if most
  248. connections require large data transfers. Splice is enabled by default on Linux
  249. since 0.9, "-s0" disables splice usage. Example:
  250. <pre>
  251. proxy -s0
  252. </pre>
  253. Splice is only available on Linux. Splice requires more system buffers and file descriptors,
  254. and produces more IOCTLs but reduces process memory and overall CPU usage.
  255. Disable splice if there is a lot of short-living connections with no bandwidth
  256. requirements.
  257. <p>Use splice only on high-speed connections (e.g. 10GBE), if processor, memory speed or
  258. system bus are bottlenecks.
  259. <p>TCP_NODELAY and splice are not contrary to each over and should be combined on
  260. high-speed connections.