New kernel module wont unload

I have a skeleton driver with init and exit that I'm loading at runtime with insmod. When I go to unload it, rmmod reports that unloading failed. When I list the driver with lsmod it shows [permanent], as if there is no exit method defined but there is - I'm assuming the [permanent] is what's preventing the driver from unloading. Here's the code:

#include <linux/module.h>    // included for all kernel modules
#include <linux/kernel.h>    // included for KERN_INFO
#include <linux/init.h>      // included for __init and __exit macros

MODULE_LICENSE("GPL");
MODULE_AUTHOR("H");
MODULE_DESCRIPTION("Reproduce Bug");

static int __init ipq8065_sqrbug_init(void)
{
    printk(KERN_INFO "ipq8065-sqrbug init!\n");
    return 0;    // Non-zero return means that the module couldn't be loaded.
}

static void __exit ipq8065_sqrbug_exit(void)
{
    printk(KERN_INFO "ips8065-sqrbug exit\n");
}

module_init(ipq8065_sqrbug_init);
module_exit(ipq8065_sqrbug_exit);

Here is my .mk:

#
# Copyright (C) 2008-2012 OpenWrt.org
#
# This is free software, licensed under the GNU General Public License v2.
# See /LICENSE for more information.
#

include $(TOPDIR)/rules.mk
include $(INCLUDE_DIR)/kernel.mk

PKG_NAME:=ipq8065-sqrbug-driver
PKG_RELEASE:=3
PKG_LICENSE:=GPL-2.0

include $(INCLUDE_DIR)/package.mk

define KernelPackage/ipq8065-sqrbug-driver
  SUBMENU:=Other modules
  TITLE:=Driver to reproduce bug
  FILES:=$(PKG_BUILD_DIR)/ipq8065-sqrbug-driver.ko
  AUTOLOAD:=$(call AutoLoad,30,ipq8065-sqrbug-driver,1)
  KCONFIG:=
endef

define KernelPackage/ipq8065-sqrbug-driver/description
 Driver to reproduce bug (description)
endef

MAKE_OPTS:= \
    $(KERNEL_MAKE_FLAGS) \
    M="$(PKG_BUILD_DIR)"

define Build/Compile
    $(MAKE) -C "$(LINUX_DIR)" \
        $(MAKE_OPTS) \
        modules
endef

$(eval $(call KernelPackage,ipq8065-sqrbug-driver))
~                                                    

A quick google suggests GCC version mismatch might cause this? Especially given your code looks normal otherwise.
OR
The toolchain was built with hardening options which might block out of tree kmods.

That's because the Openwrt default kernel config does not have CONFIG_MODULE_FORCE_UNLOAD=y set.

You'll need to recompile a kernel with this option set to enable module unloading.

CONFIG_MODULE_FORCE_UNLOAD is only necessary if there's no exit function defined for the module. Any module can be unloaded provided it has a defined exit function and the kernel was compiled with CONFIG_MODULE_UNLOAD, which I know it was because I'm able to successfully unload the discrete modules compiled into the kernel image and available in /lib/modules/uname -r

For some reason the kernel doesn't think my module has an exit function. And it's not just my module - it occurs for any of the built-in openwrt modules I compile myself and try to unload - the same modules that are in /lib/modules that do unload. For example, leds-gpio.ko

I've been pouring over the kernel source, mostly modules.c, and can't find where the code ever sets mod->init or mod->exit. I know it's finding my init function since that is being called. Here is the relevant code from modules.c (delete_module):

	/* If it has an init func, it must have an exit func to unload */
	if (mod->init && !mod->exit) {
		forced = try_force_unload(flags);
		if (!forced) {
			/* This module can't be removed */
			ret = -EBUSY;
			goto out;
		}
	}
/* Stop the machine so refcounts can't move and disable module. */
	ret = try_stop_module(mod, flags, &forced);
	if (ret != 0)
		goto out;

	mutex_unlock(&module_mutex);
	/* Final destruction now no one is using it. */
	if (mod->exit != NULL)
		mod->exit();

Yes agreed. It shouldn't be necessary. I found the same behaviour as you and could not track down the cause either. Which is why I resorted to using the force unload option. So I'm definitely interested in the answer if you can track it down.

So technically speaking, one needs to rmmod -f to force unload a module even with this kernel option configured. What I found was that if I did not have it enabled, I could not unload modules, similarly to you.

With it enabled I could unload modules with a simple rmmod without the -f flag.

root@OpenWrt:~# lsmod | grep usdm_drv
usdm_drv               90112  0 
root@OpenWrt:~# rmmod usdm_drv
root@OpenWrt:~# lsmod | grep usdm_drv
root@OpenWrt:~# dmesg | grep usdm
[92866.360317] usdm_drv: Unloading USDM Module Version 0.7.1...

So, this shows the driver unloading and its exit routine being called. However, without the FORCE_UNLOAD option I cannot unload this module.

Thanks. I'm determined to get to the bottom of it. It's clear the cause is modules.c thinking our modules don't have an exit function. What's not clear is why - they're able to find our init's, and both the init and exit are referenced in the same .gnu.linkonce.this_module section of the Elf, so it shouldn't be a matter of the compiler packaging or naming the sections differently. Just to be sure I compared the leds-gpio.ko I downloaded from my router at /libs/modules/... (which can unload) to the same module I compiled myself (which can't) - they both have identical .gnu.linkonce.this_module section info.

I've grep'd and re-grep'd the kernel code countless times and still can't find where init or exit are set. I see where a version of init/exit are set in modutils-24.c, which is the Busybox user-space app that implements insmod. The structure those are set in are then passed to the kernel's modules.c::init_module() via a system call, and that code does reference many of the sections and data built by modutils-24.c but not the init/cleanup ptrs it sets. The search continues...

In the process I've become very familiar with entire module-loading path so at least the time spent is educational.

I'm curious what happens if you compile a kernel with the FORCE_UNLOAD option and then try to do a normal rmmod without the -f to see if your module's exit routine gets called...

Based on the module.c source I posted above the presence of FORCE_UNLOAD shouldn't have any impact - it will call mod->exit() if modules.c thinks it's available, for both the non-forced and forced cases.

Yep, I just looked at the code too. The definition is only used once in module.c

This is a relatively old post, but it may be the reason for the behaviour you're experiencing. At a quick glance, the dependency still appears to be there on my x86_64 machine: http://pritambankar.blogspot.com/2012/10/solution-to-problem-of-module-getting.html

Thanks. Tried it and it didn't make a difference. If only I could find the relevant kernel source that sets mod->exit...

It's a whole bunch of rather hard to read preprocessor macros in include/linux/module.h and include/linux/init.h.

What about trying, instead of using the module_exit() macro, to define your exit function thus

void cleanup_module(void)
{
    ...
}

Do you still get the same problem?

I think I found the issue. I said earlier I compared the "gnu.linkonce.this_module" sections of two instances of the leds-gpio.ko module - one that is built-in to my router's kernel image and one I built myself and that they were identical. I revisited that comparison and hadn't noticed the relocation field offset was actually different. Here's the section dump from the built-in module (which unloads successfully):

d8: R_ARM_ABS32 _52
150: R_ARM_ABS32   _46

(the _52 and _46 correspond to init and cleanup - the symbolic names have been stripped, I'm assuming to save image space)

And the section dump of the leds-gpio.ko I built on my system (which is marked 'permanent' by lsmod and doesn't unload):

d8: R_ARM_ABS32 init_module
178: R_ARM_ABS32    cleanup_module

So the offset to exit/cleanup_module has moved from 0x150 (4.14.171 kernel) to 0x0x178 (4.19.108 kernel) in the section, which is why the 4.14.171 kernel I'm running can't unload the image I built using the 4.19.108 build environment. I still haven't found the relevant kernel code associated for this. On the build side this section is generated in our modules by the MODPOST portion of the build/script, which emits a .mod.o object module containing the "gnu.linkonce.this_module" section. I'm going to dig in to modifying that to change the offset, after which I assume it'll work.

I'm not sure if they moved the offset of the exit method in the section as a normal course of modifying the software or if it's done intentionally to break unloading, perhaps if there's some non-backward compatible change to the unloading process.

Oh, for sure that's the problem. I had no idea your build and runtime environment were not matched. Your modules need to match exactly the running kernel... (I've understood you to mean that you built a module against a 4.19 kernel and you're loading it on a 4.14 kernel)

You're having an unloading problem even with the kernel and build environment match? Perhaps the following might be useful to you, although I'm not sure how the structure contents could change if you have the versions matched.

Got to the bottom of how this is laid out - seems obvious in retrospect :frowning: The ".gnu.linkonce.this_module section" section generated by MODPOST in <module_name>.mod.o is actually the struct module in module.h, with almost all of the fields filled in at runtime by the kernel's module.c but with a few specified at build time, such as the module's name and init and exit points. This is why I couldn't find the code in the kernel that sets the init/exit fields - the this_module section is pulled in and becomes the runtime structure, with the init/exit fields already set at build time (they get their relocation addresses fixed-up at runtime based on where the module is loaded). Since the contents of the section become the runtime struct there's no way to hack the build to put the exit function pointer at the previous-kernel's offset in the structure since it overlays a different field and will be overwritten at runtime anyway. This becomes obvious when looking at modpost.c::add_header(), which actually instantiates the struct module source that gets compiled into the this_module section:

static void add_header(struct buffer *b, struct module *mod)
{
	buf_printf(b, "#include <linux/module.h>\n");
	buf_printf(b, "#include <linux/vermagic.h>\n");
	buf_printf(b, "#include <linux/compiler.h>\n");
	buf_printf(b, "\n");
	buf_printf(b, "MODULE_INFO(vermagic, VERMAGIC_STRING);\n");
	buf_printf(b, "\n");
	buf_printf(b, "__visible struct module __this_module\n");
	buf_printf(b, "__attribute__((section(\".gnu.linkonce.this_module\"))) = {\n");
	buf_printf(b, "\t.name = KBUILD_MODNAME,\n");
	if (mod->has_init)
		buf_printf(b, "\t.init = init_module,\n");
	if (mod->has_cleanup)
		buf_printf(b, "#ifdef CONFIG_MODULE_UNLOAD\n"
			      "\t.exit = cleanup_module,\n"
			      "#endif\n");
	buf_printf(b, "\t.arch = MODULE_ARCH_INIT,\n");
	buf_printf(b, "};\n");
}

Facepalm :frowning: