Discussion:
5.3.7 64-bits kernel doesn't boot on G5 Quad
(too old to reply)
Romain Dolbeau
2019-11-07 08:20:01 UTC
Permalink
Hello,

The current linux-image package (5.3.7-1) in debian-ports won't boot
on my G5 Quad (dual 970MP).
I'm currently running the powerpc ports (32 bits userland), not ppc64,
but I assume it makes no difference for the 64 bits kernel.

I have the same error as previously reported for 5.2.7 in:
<https://lists.debian.org/debian-powerpc/2019/08/msg00004.html>

Kernel packages:

#####
powermacg5:~$ dpkg -l linux-image-*
(...)
ii linux-image-4.19.0-5-powerpc64 4.19.37-5 powerpc
Linux 4.19 for 64-bit PowerPC
ii linux-image-4.9.0-0.bpo.6-powerpc64 4.9.88-1+deb9u1~bpo8+1 powerpc
Linux 4.9 for 64-bit PowerPC
ii linux-image-5.3.0-1-powerpc64 5.3.7-1 powerpc
Linux 5.3 for 64-bit PowerPC
#####

linux-image-4.9.0-0.bpo.6-powerpc64 is from jessie-backports,
linux-image-4.19.0-5-powerpc64 is from the Fienix repository (can't
find a Debian build for ppc64 between 4.9 and 5.3).
Both works perfectly on the G5.

Any suggestion on how to identify/fix the bug ?

Thanks in advance & cordially,
--
Romain Dolbeau
John Paul Adrian Glaubitz
2019-11-07 08:40:02 UTC
Permalink
Hi Romain!
Post by Romain Dolbeau
Any suggestion on how to identify/fix the bug ?
The answer here would be git bisect [1]. I would first start downloading
the current kernel source tarball for 5.3.7 from upstream and build the
kernel with "make localmodconfig" to see whether it's an upstream or Debian
problem.

If the upstream kernel doesn't work either, then you have to bisect to
find which commit broke it.

Adrian
Post by Romain Dolbeau
[1] https://www.kernel.org/doc/html/v4.15/admin-guide/bug-bisect.html
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Mathieu Malaterre
2019-11-07 10:00:01 UTC
Permalink
Salut Romain,

On Thu, Nov 7, 2019 at 9:34 AM John Paul Adrian Glaubitz
Post by John Paul Adrian Glaubitz
Hi Romain!
Post by Romain Dolbeau
Any suggestion on how to identify/fix the bug ?
The answer here would be git bisect [1]. I would first start downloading
the current kernel source tarball for 5.3.7 from upstream and build the
kernel with "make localmodconfig" to see whether it's an upstream or Debian
problem.
I suspect this is upstream and even before 5.3.7. The original report
Post by John Paul Adrian Glaubitz
If the upstream kernel doesn't work either, then you have to bisect to
find which commit broke it.
Adrian
Post by Romain Dolbeau
[1] https://www.kernel.org/doc/html/v4.15/admin-guide/bug-bisect.html
--
.''`. John Paul Adrian Glaubitz
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Romain Dolbeau
2019-11-10 10:50:01 UTC
Permalink
Hello,

Following my bug report on debian-powerpc and the following, I'm
adding linuxppc-dev to my answer.
Post by Mathieu Malaterre
On Thu, Nov 7, 2019 at 9:34 AM John Paul Adrian Glaubitz
Post by John Paul Adrian Glaubitz
The answer here would be git bisect [1]. I would first start downloading
the current kernel source tarball for 5.3.7 from upstream and build the
kernel with "make localmodconfig" to see whether it's an upstream or Debian
problem.
I suspect this is upstream and even before 5.3.7. The original report
Thanks to you guys I was able to cross-build a kernel (native would
have been way to slow) and try to bisect the bug.
However, I ended up in another buggy behavior that may or may not be related :-(
I tried to localize the bug(s) as best I could.

I attach to this the result of 'git bisect log' to see if someone can
figure out what's going on. Some of the commits were hand-picked to
avoid doing too many 'skip' configurations with the bug described
below.

As I marked this:

*bad*: seems to behave the same as the Debian kernel or git HEAD, that
is, crash as reported (or very similar to, at least) by Christian.

*good*: either a full boot, or occasionally the kernel dropped me into
initramfs complaining about versioning of some symbols - I figured it
went far enough so it did not have the bug. The last one in the file
is a full boot as I remember it.

*skip*: most of them involves hanging just after cpus start (i.e.
including 3 messages about kick-up for the extra cpu and one about
bringing up being done, then nothing further). Those covers a lot of
commits :-(

(I used <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git>
and my build command was "make ARCH=powerpc
CROSS_COMPILE=powerpc64-linux-gnu- oldconfig && make ARCH=powerpc
CROSS_COMPILE=powerpc64-linux-gnu- -j56 bindeb-pkg", using default
values for oldconfig, starting with the config file from Debian at the
beginning, cross-gcc is 8.3 from buster, tested on a G5 Quad).

Any suggestion, advice, or patch to try welcome :-)

Thanks in advance& cordially,
--
Romain Dolbeau
Romain Dolbeau
2019-11-16 16:40:01 UTC
Permalink
Post by Romain Dolbeau
Any suggestion, advice, or patch to try welcome :-)
From my bisect, I figured that
0034d395f89d9c092bb15adbabdca5283e258b41 was the likely culprit, but
that the bug was masked by the printk() issues that were fixed later:
commit 2ac5a3bf7042a1c4abbcce1b6f0ec61e5d3786c2 mentions "Report on
Power: Kernel crashes very early during boot". Once that was merged,
the current bug appears instead of the early crash.

So what I did was:

1) checkout 0034d395f89d9c092bb15adbabdca5283e258b41
2) merge printk-for-5.2 & printk-for-5.2-fixes from pmladek/printk,
hoping to remove the printk issue ; there was only fairly trivial
conflicts

-> the resulting kernel displays the same bug as Debian's and vanilla
5.3 and HEAD, which is what I hoped for.

However, the merges did pick up quite a few commits, so I then:

3) did a revert of only 0034d395f89d9c092bb15adbabdca5283e258b41 ;
that did not cause conflicts (on HEAD, this causes a lot of conflicts
I don't know how to resolve).

-> the resulting kernel works fine ! :-)

So it seems to me that 0034d395f89d9c092bb15adbabdca5283e258b41
introduced the bug that crashes the PowerMac G5 (as/could anyone
tried/try on some other PPC970-based system, like a JS20 ? to see if
it's PowerMac-specific or not).

If anyone has an idea on how to fix this, that would be very welcome,
as I'm way out of my depth in the PPC64 MMU code.

Cordially,
--
Romain Dolbeau
Romain Dolbeau
2019-12-10 08:40:01 UTC
Permalink
Hello,
Post by Romain Dolbeau
So it seems to me that 0034d395f89d9c092bb15adbabdca5283e258b41
introduced the bug that crashes the PowerMac G5
There's been some commits in that subsystem, so I tried again; as of
6794862a16ef41f753abd75c03a152836e4c8028, the kernel still crashes
when trying to boot my PowerMac G5.

Cordially,
--
Romain Dolbeau
John Paul Adrian Glaubitz
2019-12-10 10:30:01 UTC
Permalink
Hi!
Post by Romain Dolbeau
Post by Romain Dolbeau
So it seems to me that 0034d395f89d9c092bb15adbabdca5283e258b41
introduced the bug that crashes the PowerMac G5
There's been some commits in that subsystem, so I tried again; as of
6794862a16ef41f753abd75c03a152836e4c8028, the kernel still crashes
when trying to boot my PowerMac G5.
If Aneesh is currently unable to look at the problem, I would suggest reverting
the commit in question since I don't think it's acceptable that users are unable
to boot their machines anymore after a kernel upgrade.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - ***@debian.org
`. `' Freie Universitaet Berlin - ***@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Aneesh Kumar K.V
2019-12-11 03:10:03 UTC
Permalink
Post by John Paul Adrian Glaubitz
Hi!
Post by Romain Dolbeau
Post by Romain Dolbeau
So it seems to me that 0034d395f89d9c092bb15adbabdca5283e258b41
introduced the bug that crashes the PowerMac G5
There's been some commits in that subsystem, so I tried again; as of
6794862a16ef41f753abd75c03a152836e4c8028, the kernel still crashes
when trying to boot my PowerMac G5.
If Aneesh is currently unable to look at the problem, I would suggest reverting
the commit in question since I don't think it's acceptable that users are unable
to boot their machines anymore after a kernel upgrade.
The PowerMac system we have internally was not able to recreate this.
Hence we have not been able to make progress on this.

At this point, I am not sure what would cause the Machine check with
that patch series because we have not changed the VA bits in that patch.

-aneesh
jjhdiederen
2019-12-11 07:30:02 UTC
Permalink
I have an iMac iSight with a 2.1 GHz PowerPC 970fx (G5) processor, that
boots fine with the latest ppc64 kernel.
Le mer. 11 déc. 2019 à 03:20, Aneesh Kumar K.V
Post by Aneesh Kumar K.V
The PowerMac system we have internally was not able to recreate this.
To narrow down the issue - is that a PCI/PCI-X (7,3 [1]) or PCIe G5
(11,2 [1]) ?
Single, dual or quad ?
Same question to anyone else with a G5 / PPC970 - what is it and does
it boot recent PPC64 Linux kernel ?
Christian from the original report has a quad, like me (so
powermac11,2).
There was also a report of a powermac7.3 working in the original
discussion,
single or dual unspecified.
So this might be a Quad thing, or a more general 11,2 thing...
Post by Aneesh Kumar K.V
At this point, I am not sure what would cause the Machine check with
that patch series because we have not changed the VA bits in that patch.
Any test I could run that would help you tracking the bug ?
Cordially,
Romain
[1]
<https://en.wikipedia.org/wiki/Power_Mac_G5#Product_revision_history>
--
Romain Dolbeau
Romain Dolbeau
2019-12-11 07:30:02 UTC
Permalink
Le mer. 11 déc. 2019 à 03:20, Aneesh Kumar K.V
Post by Aneesh Kumar K.V
The PowerMac system we have internally was not able to recreate this.
To narrow down the issue - is that a PCI/PCI-X (7,3 [1]) or PCIe G5 (11,2 [1]) ?
Single, dual or quad ?

Same question to anyone else with a G5 / PPC970 - what is it and does
it boot recent PPC64 Linux kernel ?

Christian from the original report has a quad, like me (so powermac11,2).

There was also a report of a powermac7.3 working in the original discussion,
single or dual unspecified.

So this might be a Quad thing, or a more general 11,2 thing...
Post by Aneesh Kumar K.V
At this point, I am not sure what would cause the Machine check with
that patch series because we have not changed the VA bits in that patch.
Any test I could run that would help you tracking the bug ?

Cordially,

Romain

[1] <https://en.wikipedia.org/wiki/Power_Mac_G5#Product_revision_history>


--
Romain Dolbeau
jjhdiederen
2019-12-12 07:40:01 UTC
Permalink
PowerMac 7,3 G5 2.5 DP PCI-X Mid-2004 is affected with this bug. The
machine freezes at boot due to the new ppc64 kernel.


Regards,
Jeroen Diederen
Le mer. 11 déc. 2019 à 03:20, Aneesh Kumar K.V
Post by Aneesh Kumar K.V
The PowerMac system we have internally was not able to recreate this.
To narrow down the issue - is that a PCI/PCI-X (7,3 [1]) or PCIe G5 (11,2 [1]) ?
Single, dual or quad ?
Same question to anyone else with a G5 / PPC970 - what is it and does
it boot recent PPC64 Linux kernel ?
Christian from the original report has a quad, like me (so
powermac11,2).
There was also a report of a powermac7.3 working in the original discussion,
single or dual unspecified.
So this might be a Quad thing, or a more general 11,2 thing...
Post by Aneesh Kumar K.V
At this point, I am not sure what would cause the Machine check with
that patch series because we have not changed the VA bits in that patch.
Any test I could run that would help you tracking the bug ?
Cordially,
Romain
[1]
<https://en.wikipedia.org/wiki/Power_Mac_G5#Product_revision_history>
--
Romain Dolbeau
Romain Dolbeau
2019-12-12 08:10:02 UTC
Permalink
Post by jjhdiederen
PowerMac 7,3 G5 2.5 DP PCI-X Mid-2004 is affected with this bug. The
machine freezes at boot due to the new ppc64 kernel.
Thanks for the reports!

So it's not all G5, but it's across generations - the iMac is PCIe.
Perhaps multiprocessors?

Could you share the screen/log of the crash?

For my G5 with 5.5rc1 I have one, but the photo is terrible:
<Loading Image...>
Timestamps overlap, as after the 'crash' (backtrace) there was
more messages from the (S)ATA subsystem.

Cordially,
--
Romain Dolbeau
Christian Marillat
2019-12-12 08:50:01 UTC
Permalink
Post by Romain Dolbeau
Post by jjhdiederen
PowerMac 7,3 G5 2.5 DP PCI-X Mid-2004 is affected with this bug. The
machine freezes at boot due to the new ppc64 kernel.
Thanks for the reports!
So it's not all G5, but it's across generations - the iMac is PCIe.
Perhaps multiprocessors?
Could you share the screen/log of the crash?
<http://www.dolbeau.name/dolbeau/files/Photo0031.jpg>
Timestamps overlap, as after the 'crash' (backtrace) there was
more messages from the (S)ATA subsystem.
Mine are still here :

Loading Image...
Loading Image...

Christian
Romain Dolbeau
2019-12-12 18:20:01 UTC
Permalink
Le jeu. 12 déc. 2019 à 09:08, John Paul Adrian Glaubitz
I suggest booting the machine with a netconsole to get a dump of the crash
over the network, see [1].
I added netconsole (not as a module, directly in the kernel), but I
get nothing on my receiver 'nc'.
I don't see the network interface identified anywhere prior to the
crash, which could explains it
(I might also have misconfigured the command-line...). The crash is
probably too early.

I also tried compiling w/o SMP to see if that changed anything, but
the kernel won't compile:
first the ppc_watchdog seems to rely on a SMP symbol, and once I
disabled it, the MM subsystem was missing some 'numa' symbols.

Cordially,
--
Romain Dolbeau
Andreas Schwab
2019-12-12 18:50:01 UTC
Permalink
Same question to anyone else with a G5 / PPC970 - what is it and does
it boot recent PPC64 Linux kernel ?
My PowerMac7,3 (DP 2.0GHz) can boot 5.5-rc1 without issues.

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Romain Dolbeau
2019-12-12 18:50:01 UTC
Permalink
Post by Andreas Schwab
My PowerMac7,3 (DP 2.0GHz) can boot 5.5-rc1 without issues.
This is weird, as except for the frequency it should be the same
system as Jeroen's crashing G5!

Can you share your kernel config, compiler version, and other details?
Perhaps even the binary...

Thanks & cordially,
--
Romain Dolbeau
Romain Dolbeau
2019-12-13 07:50:01 UTC
Permalink
I'm using 4K pages, in case that matters
Yes it does matter, as it seems to be the difference between "working"
and "not working" :-)
Thank you for the config & pointing out the culprit!

With your config, my machine boots (though it's missing some features
as the config seems quite tuned).

Moving from 64k pages to 4k pages on 'my' config (essentially,
Debian's 5.3 with default values for changes since), my machine boots
as well & everything seems to work fine.

So question to Aneesh - did you try 64k pages on your G5, or only 4k?
In the second case, could you try with 64k to see if you can reproduce
the crash?

To Jeroen - is your iMac booting with 4k or 64k pages? Same question
for the crashing G5, though I assume the answer is going to be 64k
there.

Thanks & cordially,
--
Romain Dolbeau
Jeroen Diederen
2019-12-14 09:40:02 UTC
Permalink
on my iMac iSight:


grep CONFIG_PPC.*PAGE config-5.3.0-3-powerpc64
# CONFIG_PPC_4K_PAGES is not set
CONFIG_PPC_64K_PAGES=y
CONFIG_PPC_PAGE_SHIFT=16
# CONFIG_PPC_SUBPAGE_PROT is not set

I can't give you info about the G5 PowerMac 7,3 as it is not my machine.

Regards,
Jeroen

On Fri, 13 Dec 2019 08:47:49 +0100
Post by Romain Dolbeau
I'm using 4K pages, in case that matters
Yes it does matter, as it seems to be the difference between "working"
and "not working" :-)
Thank you for the config & pointing out the culprit!
With your config, my machine boots (though it's missing some features
as the config seems quite tuned).
Moving from 64k pages to 4k pages on 'my' config (essentially,
Debian's 5.3 with default values for changes since), my machine boots
as well & everything seems to work fine.
So question to Aneesh - did you try 64k pages on your G5, or only 4k?
In the second case, could you try with 64k to see if you can reproduce
the crash?
To Jeroen - is your iMac booting with 4k or 64k pages? Same question
for the crashing G5, though I assume the answer is going to be 64k
there.
Thanks & cordially,
--
Romain Dolbeau
--
Jeroen Diederen <***@zonnet.nl>
Bertrand Dekoninck
2020-01-05 15:10:03 UTC
Permalink
Post by Jeroen Diederen
grep CONFIG_PPC.*PAGE config-5.3.0-3-powerpc64
# CONFIG_PPC_4K_PAGES is not set
CONFIG_PPC_64K_PAGES=y
CONFIG_PPC_PAGE_SHIFT=16
# CONFIG_PPC_SUBPAGE_PROT is not set
I can't give you info about the G5 PowerMac 7,3 as it is not my machine.
I can now test on powermac 7,3 (with an ATI card)
How can I build a deb package of this kernel ? Or is there a package to download somewhere ?
Bertrand
Romain Dolbeau
2020-01-06 18:40:02 UTC
Permalink
Le dim. 5 janv. 2020 à 16:06, Bertrand Dekoninck
Post by Bertrand Dekoninck
I can now test on powermac 7,3 (with an ATI card)
How can I build a deb package of this kernel ? Or is there a package to download somewhere ?
I usually cross-compile on x86-64 from upstream sources. On a Debian
Buster with the powerpc tools installed,
it's just:

#####
make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- oldconfig && nice
-19 make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- -j56
bindeb-pkg
#####

(alter the -j56 for your own build system). For the dependency, as far
as I remember I only needed "gcc-powerpc64-linux-gnu" and
dependencies. My '.config' is Debian's 5.3 plus default values for
changes - with the exception of 4 KiB pages.

I've also uploaded the working kernel with 4 KiB pages DEB here:
<http://dl.free.fr/otB1KMEMR>, as it might be easier for a quick test.

Cordially,
--
Romain Dolbeau
Bertrand
2020-01-07 15:30:02 UTC
Permalink
Post by Romain Dolbeau
Le dim. 5 janv. 2020 à 16:06, Bertrand Dekoninck
Post by Bertrand Dekoninck
I can now test on powermac 7,3 (with an ATI card)
How can I build a deb package of this kernel ? Or is there a package to download somewhere ?
I've tested the 5.5-rc package with the link you gave hereafter. When
you said "working kernel", did you mean it would not crash ? It doesn't
for me. I can boot successfully.


The only weirdness I could notice is that my swap space wasn't mounted.
When I tried  to run swapon, I was given the following (roughly
translated from french) :

swapon: /dev/sdb5 : pagesize doesn't fit with space space format

              unable to find swap-space signature

So I ran mkswap on the partition, I could then run swapon successfully.

But, when I boot the old kernel ( 5.3), I've got the same error again on
swap and I have to format the swap space to mount it manually.

There's something wrong on swap page size between those two kernels.
Post by Romain Dolbeau
I usually cross-compile on x86-64 from upstream sources. On a Debian
Buster with the powerpc tools installed,
#####
make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- oldconfig && nice
-19 make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- -j56
bindeb-pkg
####
(alter the -j56 for your own build system). For the dependency, as far
as I remember I only needed "gcc-powerpc64-linux-gnu" and
dependencies. My '.config' is Debian's 5.3 plus default values for
changes - with the exception of 4 KiB pages.I should use the same and the same config file also.
I'll try this later. Should I download the 5.5-rc1 kernel source or
something else ?
Post by Romain Dolbeau
<http://dl.free.fr/otB1KMEMR>, as it might be easier for a quick test.
Cordially,
Cheers,

Bertrand
Bertrand
2020-01-07 16:00:01 UTC
Permalink
Oups. Edit :

swapon: /dev/sdb5 : pagesize doesn't fit with _swap_ space format
Post by Bertrand
swapon: /dev/sdb5 : pagesize doesn't fit with space space format
Romain Dolbeau
2020-01-07 17:20:01 UTC
Permalink
Post by Bertrand
I've tested the 5.5-rc package with the link you gave hereafter. When
you said "working kernel", did you mean it would not crash ? It doesn't
for me. I can boot successfully.
Yes, unlike the default Debian build, this one has 4 KiB page size.
It's not a solution, but it's a workaround for those affected by the bug
and perhaps a clue to the issue.

As your machine is booting regular 5.3 as well and seems not affected by
the bug, can you share some details about it? (exact model, number
of cores, ...).
Post by Bertrand
The only weirdness I could notice is that my swap space wasn't mounted.
No surprise, I think the on-disk format for the swap space hardwire
the page size. There's even a '-p' option to 'mkswap' to specify a
page size when creating the swap space.

Cordially,

--
Romain Dolbeau
Bertrand
2020-01-07 18:00:01 UTC
Permalink
Post by Romain Dolbeau
Post by Bertrand
I've tested the 5.5-rc package with the link you gave hereafter. When
you said "working kernel", did you mean it would not crash ? It doesn't
for me. I can boot successfully.
As your machine is booting regular 5.3 as well and seems not affected by
the bug, can you share some details about it? (exact model, number
of cores, ...).
It's this one :
https://everymac.com/systems/apple/powermac_g5/specs/powermac_g5_2.0_dp_pci.html

powermac G5 PCI, two 2 Ghz cpu, radeon 9600, with 2GB of ram. I can send
dmesg or something if you want.

Bertrand
Romain Dolbeau
2020-01-07 18:40:01 UTC
Permalink
Hello all,

Great news: Aneesh sent me a patch that solves the problem on my G5 Quad :-)

I don't know whether he considers it a 'workaround' or if it's the
'proper' patch for upstream, so beware. It's a one-liner so I attach
it to this message: those affected can test it as well to confirm if
that indeed solves the problem for them as well.

If it is the 'proper' patch, I expect it should get into upstream
pretty quickly. Then Debian should be able to trivially backport it to
their PPC64 kernel, and the G5 will still be a great machine in 2020!

Thanks everyone for your help in tracking down the bug & to Aneesh for
finding a fix :-)

Cordially,
--
Romain Dolbeau
Christian Marillat
2019-12-14 10:10:02 UTC
Permalink
Post by Romain Dolbeau
I'm using 4K pages, in case that matters
Yes it does matter, as it seems to be the difference between "working"
and "not working" :-)
Thank you for the config & pointing out the culprit!
That solved the problem for me.

kernel 5.3.16 build with make-kpkg and the kernel config file from
latest Debian kernel linux-image-5.3.0-3-amd64

Christian
Aneesh Kumar K.V
2019-12-21 05:20:02 UTC
Permalink
Post by Romain Dolbeau
I'm using 4K pages, in case that matters
Yes it does matter, as it seems to be the difference between "working"
and "not working" :-)
Thank you for the config & pointing out the culprit!
With your config, my machine boots (though it's missing some features
as the config seems quite tuned).
Moving from 64k pages to 4k pages on 'my' config (essentially,
Debian's 5.3 with default values for changes since), my machine boots
as well & everything seems to work fine.
So question to Aneesh - did you try 64k pages on your G5, or only 4k?
In the second case, could you try with 64k to see if you can reproduce
the crash?
I don't have direct access to this system, I have asked if we can get a run
with 64K.

Meanwhile is there a way to find out what caused MachineCheck? more
details on this? I was checking the manual and I don't see any
restrictions w.r.t effective address. We now have very high EA with 64K
page size.

-aneesh
Romain Dolbeau
2020-01-05 13:10:02 UTC
Permalink
Le sam. 21 déc. 2019 à 05:31, Aneesh Kumar K.V
Post by Aneesh Kumar K.V
I don't have direct access to this system, I have asked if we can get a run
with 64K.
OK, thanks! Do you know which model it is? It seems to be working on
some systems,
but we don't have enough samples to figure out why at this time, I think.
Post by Aneesh Kumar K.V
Meanwhile is there a way to find out what caused MachineCheck? more
details on this? I was checking the manual and I don't see any
restrictions w.r.t effective address. We now have very high EA with 64K
page size.
Sorry, no idea, completely out of my depth here. I can try some kernel
(build, runtime) options and/or patch, but someone will have to tell
me what to try,
as I have no ideas.

Cordially & Happy New Year!
--
Romain Dolbeau
Aneesh Kumar K.V
2020-01-06 14:30:02 UTC
Permalink
Post by Romain Dolbeau
Le sam. 21 déc. 2019 à 05:31, Aneesh Kumar K.V
Post by Aneesh Kumar K.V
I don't have direct access to this system, I have asked if we can get a run
with 64K.
OK, thanks! Do you know which model it is? It seems to be working on
some systems,
but we don't have enough samples to figure out why at this time, I think.
Post by Aneesh Kumar K.V
Meanwhile is there a way to find out what caused MachineCheck? more
details on this? I was checking the manual and I don't see any
restrictions w.r.t effective address. We now have very high EA with 64K
page size.
Sorry, no idea, completely out of my depth here. I can try some kernel
(build, runtime) options and/or patch, but someone will have to tell
me what to try,
as I have no ideas.
Can you try this change.

modified arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -580,7 +580,7 @@ extern void slb_set_size(u16 size);
#if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
#define MAX_KERNEL_CTX_CNT (1UL << (MAX_PHYSMEM_BITS - MAX_EA_BITS_PER_CONTEXT))
#else
-#define MAX_KERNEL_CTX_CNT 1
+#define MAX_KERNEL_CTX_CNT 4
#endif

#define MAX_VMALLOC_CTX_CNT 1


-aneesh
Romain Dolbeau
2020-01-06 18:20:02 UTC
Permalink
Le lun. 6 janv. 2020 à 15:06, Aneesh Kumar K.V
Post by Aneesh Kumar K.V
Can you try this change.
Applied, recompiled with 64 KiB pages, still crashes.

The backtrace seems more readable this time (and wasn't overwritten by
something else), bad photo here:
<Loading Image...>

Cordially,
--
Romain Dolbeau
Lennart Sorensen
2020-01-06 19:10:01 UTC
Permalink
Post by Romain Dolbeau
Applied, recompiled with 64 KiB pages, still crashes.
The backtrace seems more readable this time (and wasn't overwritten by
<http://www.dolbeau.name/dolbeau/files/Photo0033.jpg>
Is it possible this has to do with nouveau and not supporting 64K page
size on older nvidia chips? My reading of the driver is that only
NV50 and above has implemented support for anything other than 4K pages,
so a geforce 6xxx series that I believe some of the G5 machines had would
be a problem with 64K pages, while those with ATI cards would probably
not have a problem.

Maybe I read the driver changes wrong, but it sure looks like only
NV50/G84 and up got the needed fixes a couple of years ago.
--
Len Sorensen
Romain Dolbeau
2020-01-06 19:20:01 UTC
Permalink
Le lun. 6 janv. 2020 à 19:54, Lennart Sorensen
Post by Lennart Sorensen
Is it possible this has to do with nouveau and not supporting 64K page
size on older nvidia chips?
Interesting idea (and I have a 6600 aka NV43 in there, indeed) but I
don't think so, as
a) 'nouveau' works in 4.19 with 64 KiB pages
b) using "module_blacklist=nouveau" doesn't help, I just tried
c) my original 'bisect' was probably using 'nouveau' when the kernel
was booting, so at least some 5.x w/o the offending commit and 64 KiB
pages is fine
d) to my untrained eye, the crash happens _before_ nouveau is loaded
(it seems to me I'm still on the OpenFirmware framebuffer, font change
occurs later).

Unfortunately I don't have a PCIe OpenFirmware ATI card to test the
theory further.
(... well I _do_ have a Sun XVR-300 ... technically it fits the bill ... )

Cordially,
--
Romain Dolbeau
Lennart Sorensen
2020-01-06 19:30:02 UTC
Permalink
Post by Romain Dolbeau
Interesting idea (and I have a 6600 aka NV43 in there, indeed) but I
don't think so, as
a) 'nouveau' works in 4.19 with 64 KiB pages
b) using "module_blacklist=nouveau" doesn't help, I just tried
c) my original 'bisect' was probably using 'nouveau' when the kernel
was booting, so at least some 5.x w/o the offending commit and 64 KiB
pages is fine
d) to my untrained eye, the crash happens _before_ nouveau is loaded
(it seems to me I'm still on the OpenFirmware framebuffer, font change
occurs later).
Unfortunately I don't have a PCIe OpenFirmware ATI card to test the
theory further.
(... well I _do_ have a Sun XVR-300 ... technically it fits the bill ... )
Oh well. I guess that means they did fix it for all cards and I just
don't see which change was relevant for the older chips then.

Unless something was missed that only triggers occationally. That would
be annoying.
--
Len Sorensen
Michel Dänzer
2020-01-07 09:50:01 UTC
Permalink
Post by Romain Dolbeau
Unfortunately I don't have a PCIe OpenFirmware ATI card to test the
theory further.
FWIW, a non-OF Radeon >= R(V)5xx card should work in Linux (though
obviously won't light up in OF itself or MacOS).
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer
Michael Ellerman
2020-01-07 01:10:02 UTC
Permalink
Post by Aneesh Kumar K.V
Post by Romain Dolbeau
Le sam. 21 déc. 2019 à 05:31, Aneesh Kumar K.V
Post by Aneesh Kumar K.V
I don't have direct access to this system, I have asked if we can get a run
with 64K.
OK, thanks! Do you know which model it is? It seems to be working on
some systems,
but we don't have enough samples to figure out why at this time, I think.
Post by Aneesh Kumar K.V
Meanwhile is there a way to find out what caused MachineCheck? more
details on this? I was checking the manual and I don't see any
restrictions w.r.t effective address. We now have very high EA with 64K
page size.
Sorry, no idea, completely out of my depth here. I can try some kernel
(build, runtime) options and/or patch, but someone will have to tell
me what to try,
as I have no ideas.
Can you try this change.
modified arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -580,7 +580,7 @@ extern void slb_set_size(u16 size);
#if (MAX_PHYSMEM_BITS > MAX_EA_BITS_PER_CONTEXT)
#define MAX_KERNEL_CTX_CNT (1UL << (MAX_PHYSMEM_BITS - MAX_EA_BITS_PER_CONTEXT))
#else
-#define MAX_KERNEL_CTX_CNT 1
+#define MAX_KERNEL_CTX_CNT 4
#endif
Didn't help.

Same crash, here's a previous one OCR'ed from a photo:

Oops: Machine check, sig: 7 [#1]
BE PAGE SIZE-64 MMU-Hash SMP NR_CPUS 4 NUMA PowerMac
Modules linked in:
CPU: PID: 1 Comn: init Tainted: G M 5.5.0-rc4-gcc-8.2.0-00919-g443b9413a05e #1465
NIP: c00000000026f528 LR: c000000000296138 CTR: 0000000000000000
REGS: ce0000000ffa3d70 TRAP: 0200 Tainted: G M (5.5.0-rc-gcc-8.2.0-00919-g443b9413a05e)
MSR: 9000000000109032 <SF, HV, EE, ME, IR, DR, RI> CR: 24282048 XER: 00000000
DAR: c00c000000612c80 DSISR: 00000400 IRQ MASK: 8
GPR00: c0000000002970d0 c0000001bc343a90 c000000001399600 c0000001bc01c800
GPR04: ce000001bc390000 ce000001bc3439d4 c0000001bc343a9c c0000001bb4b73b8
GPR08: c0000001bc320000 0000000000612c78 c000000001442a98 0000000000000fe0
GPR12: 7f7f7f7f7f7f7f7f c0000000016a0000 0000000000000000 00000000f7df38c8
GPR16: 0000000000000002 0000000000000000 00000000ffb14bac 00000000f7df5690
GPRZ0: 00000000f7df26c4 000000000000000d 0000000000008000 00000000f7ddfc0c
GPR24: ffffffffffffff9c 0000000000000010 0000000000000002 c0000001bc343db8
GPR28: c000000000296138 c00c000000612c78 c0000001bc390000 c0000001bc01c800
NIP [c0000000626f528] .kmem_cache_free+8x58/0x140
LR [c000000000296138] .putname.0x08/0xa
Call Trace:
c0000001bc343b40 [c000000000296138] .putname+0x88/0xa0
c0000001bc343bc0 [c0000000002970d0] .filename_lookup.part.76+0xb0/0x160
ce000001bc343d40 [c000008000279b20] .do_faccessat+0xe0/0x380
c0000001bc343e20 [c00000000000a40c] systen_call+0x5c/0x68
Instruction dump:
408201e8 2fa30000 419e0080 fb8100c0 fb4100080 fb610088 789d8502 3d22000b
39499498 1d3d0038 ebaa0000 7fbd4a14 <e93d0008> 712a0001 40820240 3422001e


cheers
John Paul Adrian Glaubitz
2024-08-06 21:00:01 UTC
Permalink
Hi,
Post by Romain Dolbeau
I'm using 4K pages, in case that matters
Yes it does matter, as it seems to be the difference between "working"
and "not working" :-)
Thank you for the config & pointing out the culprit!
With your config, my machine boots (though it's missing some features
as the config seems quite tuned).
Moving from 64k pages to 4k pages on 'my' config (essentially,
Debian's 5.3 with default values for changes since), my machine boots
as well & everything seems to work fine.
Just as a heads-up: Starting with kernel 6.10, Debian has now separate kernel
packages for 4k and 64k page support for both ppc64 and ppc64el [1].

So, anyone who has run into this issue in the past has a chance to try again
with kernel 6.10.

Thanks,
Adrian
Post by Romain Dolbeau
[1] https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/changelog?ref_type=heads
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Continue reading on narkive:
Loading...