How to debug a do_page_fault/SIGSEGV?

The software is only going to work on the device since it needs the specific FXS ports in its hardware.
Anyway if I don't manage to debug the issue I'll keep using it with the old openwrt.

The problem is now I don't get neither.

this will save a core dump ...

However, the problem is right there in the error msg. The issue is, svd is trying to read from memory it does not own. Now, why it owns it in the working environment vs not owning in the non-working (updated enviroment) is the question. I would try to use the crude printf technique to hone in on the area in code in which the conflict is. Then start looking for the vaules being read, and print them out ... which would eventually lead me to the invalid read. from there you can determine how to rectify the conflict

But I don't get that anymore with sofia-sip 1.13

So, I removed the daemonization (so I keep stdout open) and I got a "Bus Error".
I added a lot of printf (in unrelated places) and I don't get the Bus error anymore (I get a real error that I'll investigate).
I read in an article (which I cannot find right now) that the compiler could generate unaligned acesses in a random fashion (modify something unrelated to the location giving the issue and the problem goes away, only to resurface later when you do more changes). I suspect it is something like that, I'll see if I can find the article (I don't remember if it gave a solution)

To me, that suggest that the error (conflict) is being handled gracefully in the newer version (though normally one would print something, somewhere to signal an error has occurred. Have you been checking syslog and dmesg for hints ? Is there possibly a log file created by the executeable in say ...likely /tmp ??
Really, we can guess at it all day. I would do as i suggested in my last post to find the issue ... if its worth it to you.

Of course I checked dmesg and syslog, but look at my last message: adding printf in unrelated parts of the code made the problem go away (I still have to check if it actually works, but at least it doesn't generate a "Bus Error" anymore).

To me , that suggest that your dealing with an uninitialized value. In C, you can declare a variable without initializing it. but you cannot in most causes use it before it's been initialize. Most funnctions, unless explicitly checked, will bail on uninitialized (NULL) values. printf, is sort of an exception, it has no problems for the most part,with uninitialized values, i simply prints the (NULL), however, once this happens, your value has now been initialized... hence no error

Only one of the printf I added deals with a pointer and confirms that it was correctly initialized, so, no, it must be something else.

Well no, not true. ANY value is C must be initialized in before using, Failing to do so will produce UNDEFINED behavior.

take for example, the following

int a; 
printf("%d\n", a);

whats the value of a .. is it a integer?, a char ?, a NULL ?

Right, the value of a here is undefined, but that won't cause a memory access error (the size of a is perfectly defined).

right, by what happens when you use the value of a for something, now that its initialized to an undefined type...

good luck

Wont fix itself by means of self-confidence. Drill through -Wall , clang analyzer etc.

I understand your concern and appreciate your help, but that's not the case here: svd is working without crashing now (well, I have to hook up a phone, but at least it is registering with the sip providers).
See for yourself the changes that "fixed" it (and that I should remove).

1 Like

It's not self confidence, this code has worked flawlessly for more than 10 years (almost 14 actually) and I don't remember the details 100% but I'd say that all allocations are checked and all the variables are initialized.
Over the years the changes in the infrastructure (compiler/build-system/etc.) or in the underlying library have caused some problem, but I think that the code itself is quite solid.

I am up to my elbows in kernel code at the moment. Don't have time to peek, but its good to hear you've got it running. It may be wise to try and debug now ...

I hooked up a phone and made a call and everything works, but, no, I don't consider it fixed, those printf don't change the logic at all (and shouldn't be there at all). I hope I'll find that article.
Debugging it on the device is nigh on impossible (it's enough of a pain to get it installed with all its dependencies) and outside the device I'm missing the needed hardware (which is embedded in the SOC).

Check with elfdump whats at offset 0x110... and try to dig to failed function, then try to match with static check indications.

But it doesn't crash anymore.
Later I'll probably remove the printfs I added to see what happens, but the "Bus Error" wouldn't leave any other trace in dmesg, so I don't know where it was failing.
Now I'm trying to build an image with everything integrated (even luci) and see how it goes.

If you have original crashing binary check it. Say one-less malloc would be hit only at page border hitting guarded page.