@efahl - Ever since running the updated snort package, I have observed some output in dmesg that I have not seen before. The timing of the snort3 update might be coincidental/nothing to do with it, but I wanted share it and get some feedback.
Have you seen anything like this?
...
[ 413.506593] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.515055] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.523318] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.531514] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.539860] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.548125] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.556388] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.564653] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.572915] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 413.581113] nfnetlink_queue: nf_queue: full at 1024 entries, dropping packets(s)
[ 1005.157663] net_ratelimit: 929 callbacks suppressed
...
Well, that's interesting. Maybe it's the lb fanout algorithm, which used bo default to hash??? Just guessing here, I'm not sure how to get internal state data out of snort, so it's hard to see what's going on with the queues, but that seems like about the only thing we've played with recently. (I wish snort had something like dnsmasq, where you kill -SIGUSR1 it and it spills its guts.)
Then there's the queue_maxlen in the nfq section of the config. I get the impression that that's on the user-space side of the queue (after nft delivers the packet out of kernel-space), but maybe increasing it would give enough space to avoid this?
You must not do this via the snort config you have to pass it as a parameter at startup because Snort tends to override/ignore the config entries and the command line parameters take precedence anyway. It would probably be better to create a startup script for Snort which is then called from the init.d file and passes the parameters to Snort as command line parameters based on the values specified in /etc/config/snort. This makes it possible to bind Snort directly to processors and give it priority over other services. Here is my current Snort start script which is called by the init.d file:
As you can see, the script delays the start if the temporary rules directory does not exist, which has the advantage that all other services can load first. I have also added a query for a version file that my update script creates during the update. And Snort was bound with highest priority to 3 of 4 Cpu cores.
I don't understand... are you saying that the reason for the drops are that snort is not running at a high enough priority when started by the init script?
Can be but does not have to be. As I said, I do everything with start parameters because I have noticed that Snort sometimes ignores entries in the config, which seems to have been the case with you because the 1024 queue is the default value but apparently there was something else in the config. In general I have noticed significant improvements when I set the priority and the cpu cores try to do it like I bind Snort to 7 of your eight cores (except Cpu 0) and set the queues down to 7. You can use my script, you just have to adjust it.
This is almost always due to typos in your config that lua interprets as its nil value. For example, if you leave quotes off a value, it is interpreted as undefined variable with value nil and is set to snort's internal default. Here's an example:
daq = {
modules = {
{
name = nfq, -- Undefined variable, results in 'pcap' being used.
it should be
name = 'nfq',
This is just how lua works and there's no way to "fix" it, you just have to be extra careful when writing lua code.
Does that apply to my include.snort as well? If so, there are may unquoted variables, see above. It seems that there are exceptions to this rule such as:
numbers
true and false
In any case, do you think unquoted variables could be to blame for my "full at 1024 entries" problem? Here is the autogenerated /var/snort.d/snort_conf.lua after I quoted a few of those variables.
/var/snort.d/snort_conf.lua
-- Do not edit, automatically generated. See /usr/share/snort/templates.
-- These must be defined before processing snort.lua
HOME_NET = [[ 10.9.8.0/24 10.9.7.0/24 10.9.6.0/24 10.9.5.0/24 10.200.200.0/24 ]]
EXTERNAL_NET = [[ !$HOME_NET ]]
include('/etc/snort/snort.lua')
snort = {
['-Q'] = true,
['--daq'] = 'nfq',
--['--daq-dir'] = '/usr/lib/daq/',
['--max-packet-threads'] = 8,
}
ips = {
mode = 'inline',
variables = default_variables,
action_override = 'drop',
include = '/etc/snort/' .. RULE_PATH .. '/snort.rules',
}
daq = {
inputs = { '4', '5', '6', '7', '8', '9', '10', '11', },
snaplen = 65531,
module_dirs = { '/usr/lib/daq/', },
modules = {
{
name = 'nfq',
mode = 'inline',
variables = { 'device=eth1', 'queue_maxlen=8192', 'fanout_type=lb', 'fail_open', },
}
}
}
alert_syslog = {
level = 'info',
}
-- Note that this is also the location of the PID file, if you use it.
output.logdir = '/mnt/data'
-- alert_full = { file = true, }
--[[
alert_fast = {
-- bool alert_fast.file = false: output to alert_fast.txt instead of stdout
-- bool alert_fast.packet = false: output packet dump with alert
-- int alert_fast.limit = 0: set maximum size in MB before rollover (0 is unlimited) { 0:maxSZ }
file = true,
packet = false,
}
--]]
alert_json = {
-- bool alert_json.file = false: output to alert_json.txt instead of stdout
-- int alert_json.limit = 0: set maximum size in MB before rollover (0 is unlimited) { 0:maxSZ }
-- string alert_json.separator = , : separate fields with this character sequence
-- multi alert_json.fields = 'timestamp pkt_num proto pkt_gen pkt_len dir src_ap dst_ap'
-- Rule action: selected fields will be output in given order left to right.
-- { action | class | b64_data | client_bytes | client_pkts | dir
-- | dst_addr | dst_ap | dst_port | eth_dst | eth_len | eth_src
-- | eth_type | flowstart_time | geneve_vni | gid | icmp_code
-- | icmp_id | icmp_seq | icmp_type | iface | ip_id | ip_len
-- | msg | mpls | pkt_gen | pkt_len | pkt_num | priority
-- | proto | rev | rule | seconds | server_bytes | server_pkts
-- | service | sgt | sid | src_addr | src_ap | src_port | target
-- | tcp_ack | tcp_flags | tcp_len | tcp_seq | tcp_win | timestamp
-- | tos | ttl | udp_len | vlan }
-- This is a minimal set of fields that simply supports 'snort-mgr report'
-- and minimizes log size:
fields = 'dir src_ap dst_ap msg',
-- This set also supports the report, but closely matches 'alert_fast' contents.
--fields = 'timestamp pkt_num proto pkt_gen pkt_len dir src_ap dst_ap rule action msg',
file = true,
}
--[[
unified2 = {
limit = 10, -- int unified2.limit = 0: set maximum size in MB before rollover (0 is unlimited) { 0:maxSZ }
}
--]]
normalizer = {
tcp = {
ips = true,
}
}
file_policy = {
enable_type = true,
enable_signature = true,
rules = {
use = {
verdict = 'log',
enable_file_type = true,
enable_file_signature = true,
}
}
}
-- To use openappid with snort, 'opkg install openappid' and enable in config.
-- The following content from included file '/etc/snort/include.snort'
-- Disable output to syslog
alert_syslog = 'nil'
alert_json = 'nil'
-- Enable output to alert_fast.txt
alert_fast = {
file = true,
packet = false,
}
-- This section modifies the json output to be compatible with 'snort-mgr report',
-- but includes all the fields you would see when using 'alert_fast'.
--alert_json = {
-- fields = 'timestamp pkt_num proto pkt_gen pkt_len dir src_ap dst_ap rule action msg',
-- file = true,
--}
suppress = {
-- this kills stuff in lxc
{
gid = '1', sid = '650', track = 'by_dst', ip = '10.9.8.101'
},
}
network = {
checksum_eval = 'none',
}
search_engine = {
search_method = 'hyperscan',
offload_search_method ='hyperscan',
detect_raw_tcp = true,
}
detection = {
hyperscan_literals = true,
pcre_to_regex = true,
}
Yup, include.snort is just a lua blob, so is subject to this same thing. As you've already deduced, true, false, ints, floats, nil and so on are the pre-defined built-ins (maybe more, those are of course most common).
The only part of the auto-generated code about which I have questions is the daq.modules.variables value, but I copied its format from the snort distribution's talos.lua and inline.lua, so I think it's correct. All the values are strings of the form 'setting=value'
if you change them to "field references" setting='value'
you get snort errors.
In your include.snort, I see only one thing "wrong" (but it probably works ok anyhow, apparently snort is very lenient "I don't know what this is, so I'll just use the default"). The first two lines
alert_xxx = 'nil'
which shouldn't have quotes on nil (it's intended be a real nil).
All the rest of the paths and numbers and values look ok to me.
Also note that double brackets [[string]] is lua's form of multi-line string (if you're familiar with python, it's just like the triple quotes).
Additionally, the brackets can be used inside a comment, too, so that unified2 blob is a multi-line comment:
--[[
unified2 = {
limit = 10, -- int unified2.limit = 0: set maximum size in MB before rollover (0 is unlimited) { 0:maxSZ }
}
--]]
This is exactly why I prefer the command line version, the lua version simply offers too much room for error and for non-programmers it is simply incomprehensible. Apart from that, Snort gives priority to command line parameters, so no matter what is in the lua, if there is a parameter with a different value in the command line, it will be taken. As I said, it would make more sense to write a script that rewrites the config options into command line parameters with which Snort is started. You can then identify errors based on the attached parameters, which is much easier for non-experts.
Both thread_count and snaplen are ints per the snort reference manual, so don't need any quoting in that context. (I suspect they'd work either way, though, snort probably does an int(x) call around them internally, as part of its argument parsing code.)
Evidence that thread_count is working would be the number of alert files generated, as snort creates one per thread and in my testing I'm seeing 8 of them.
@xxxx@efahl - is there any way I can take my old setup (using local.lua and running with snort -c /etc/snort/snort.lua --tweaks local and get snort to parse that out into a cmdline set of flags? My goal is to figure out what is different between the old working setup, and the new setup throwing those nfnetlink_queue drops.
@efahl - could the difference be in how your new code setups the firewall chain?
old method:
/etc/snort/snort-table.sh
#!/bin/sh
verbose=false
nft list tables | grep -q 'snort' && nft flush table inet snort
nft -f - <<TABLE
table inet snort {
chain IPS {
type filter hook postrouting priority 225; policy accept;
ct state invalid drop;
# Add here accept or drop rules to bypass snort or drop traffic that snort not should see
# Note that if nat is enabled, snort will only see the address of the outgoing device for outgoing traffic,
# for example for wan port the wan ip address or if you are using vpn the device address of the virtual #adapter
oifname "eth1" tcp flags ack ct state established counter accept
#"eth0" must be changed to the appropriate wan port on the target system. A vpn needs a second rule with the name of the virtual vpn wan port.
counter queue flags bypass to 4-11
}
}
TABLE
$verbose && nft list table inet snort
exit 0
new method:
/etc/snort/include.nfq
ct state invalid drop;
oifname "{{ snort.interface }}" tcp flags ack ct state established counter accept
I am not sure what parts of @efahl code is run that builds on /etc/snort/include.nfq... Is the right command to show all nft list ruleset?
If so, I can capture that output under the old/working configuration and then again under the new configuration and look for differences.
EDIT: I diffed the output of those two commands under the old setup and the new setup. There are minor changes. Could these be driving the dropped packets?
Old (working) is on left and new (dropped packets) is on right:
No, they are almost the same, except that the new one is further back in the chain priority, which doesn't make much difference in terms of speed. But the new code is outdated I already wrote the Snort queue directly into the firewall 4 table a while ago because it is simply faster and I also changed the code so that the Wan detection works automatically, making additional shares superfluous for several internal networks because Snort only controls everything that goes in and out via the Wan port.
@efahl could you rewrite your scripts so that the /etc/config/snort options are passed to Snort via the command line? Then you could start Snort via the script and you would still need to deliver a standard Snort.lua. This would greatly limit the possibility of errors, especially if you could include functions such as cpu priority, cpu pinning, rule version query.
Is the new code here what @efahl wrote into his PR or my /etc/snort/include.nfq
# cat /etc/snort/include.nfq
ct state invalid drop;
oifname "{{ snort.interface }}" tcp flags ack ct state established counter accept
And:
# snort-mgr print nftables
# Do not edit, automatically generated. See /usr/share/snort/templates.
table inet snort {
chain postrouting_ips {
type filter hook postrouting priority 300
policy accept
#-- The following content included from '/etc/snort/include.nfq'
ct state invalid drop;
oifname "eth1" tcp flags ack ct state established counter accept
#-- End of included file.
counter queue flags bypass to 4-11
}
}
Yes, as I said, everything would have to be revised, the high hook priority is also not ideal because of Wireguard, which is why I had set it down to 225.
It should look like this now, I hope I haven't overlooked anything.
table inet fw4 {
chain IPS {
type filter hook postrouting priority 225
policy accept
oifname { $(uci get network.wan.device),Wg0 } tcp flags ack ct state established counter accept
oifname $(uci get network.wan.device) udp dport 51820 counter accept
oifname { $(uci get network.wan.device),Wg0 } counter queue flags bypass to 4-11
iifname { $(uci get network.wan.device),Wg0 } counter queue flags bypass to 4-11
}
}
The uci command gets the Wan Ethernet port directly from the Openwrt config and the iifname and oifname commands ensure that only the traffic running over the Wan port is processed by Snort. The fact that the Wg0 is also included is irrelevant because nftables does not give an error if there is no wireguard interface and it still works.