Discussion:
Bug#1023649: ghc: FTBFS haskell-random powerpc (ghc Segmentation fault)
(too old to reply)
John Paul Adrian Glaubitz
2023-09-16 06:30:01 UTC
Permalink
Hi!

Note that this issue also prevents us from building the latest version of the
GHC compiler [1]. I have tried to cross-compile GHC 9.4.6 to work around this
issue and also tried building an unregisterised compiler. Both without success.

I have forwarded the issue upstream [2]. I think the only way forward will be to
bisect the problem unless a GHC expert can figure out what the problem is.

Adrian
[1] https://buildd.debian.org/status/package.php?p=ghc&suite=sid
[2] https://gitlab.haskell.org/ghc/ghc/-/issues/23969
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2023-09-20 10:20:01 UTC
Permalink
Hello!

I have been bisecting this issue but in order to be successful, I need a simple
reproducer which isn't trivial since I cannot just reuse the build directory of
an unsuccessful build due to the changing Haskell libraries for different GHC
versions.

Ideally, we should have a single command line with GHC which will trigger the
segmentation fault.

To bisect, I have done the following:

# git bisect start
# git bisect good ghc-8.10.7-release
# git bisect bad ghc-9.2.7-release
# git submodule update --init
# ./boot ; python3 boot
# echo "SRC_HC_OPTS += -lffi -latomic -optl-pthread" >> mk/build.mk && \
echo "Stage1Only := YES" >> mk/build.mk && \
echo 'utils/genprimopcode_CONFIGURE_OPTS += "-f-build-tool-depends"' \
mk/build.mk && echo 'compiler_CONFIGURE_OPTS += "-f-build-tool-depends"' \
mk/build.mk && echo 'utils/hpc_CONFIGURE_OPTS += "-f-build-tool-depends"' \
mk/build.mk && echo "HADDOCK_DOCS := NO" >> mk/build.mk \
&& echo "BUILD_SPHINX_HTML := NO" >> mk/build.mk && echo "BUILD_SPHINX_PDF := NO" \
mk/build.mk
# ./configure && make -j32

For newer versions, the build has to be performed with Hadrian, so the last step
would be:

# ./hadrian/build -j

Prior to using Hadrian for the first time, the package database needs to be updated:

# cabal update

Now, if I had a simple reproducer, it would be rather easy to bisect the issue.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Ilias Tsitsimpis
2023-09-20 12:10:01 UTC
Permalink
Hi Adrian,

Thanks for working on this, comments inline.
Post by John Paul Adrian Glaubitz
I have been bisecting this issue but in order to be successful, I need a simple
reproducer which isn't trivial since I cannot just reuse the build directory of
an unsuccessful build due to the changing Haskell libraries for different GHC
versions.
Ideally, we should have a single command line with GHC which will trigger the
segmentation fault.
Are you bisecting the segfault issue? If yes, then a simple reproducer
would be to try and compile haskell-random.

Since you already have cabal-install on your system, you can do
something like:

$ cabal install --with-ghc=/usr/bin/ghc --with-ghc-pkg=/usr/bin/ghc-pkg random-1.2.1.1

(replace ghc and ghc-pkg with the ones you have built).
Post by John Paul Adrian Glaubitz
# git bisect start
# git bisect good ghc-8.10.7-release
# git bisect bad ghc-9.2.7-release
Since this issue is also present in ghc-9.0.2, maybe we can start from
there.
Post by John Paul Adrian Glaubitz
# git submodule update --init
# ./boot ; python3 boot
# echo "SRC_HC_OPTS += -lffi -latomic -optl-pthread" >> mk/build.mk && \
echo "Stage1Only := YES" >> mk/build.mk && \
echo 'utils/genprimopcode_CONFIGURE_OPTS += "-f-build-tool-depends"' \
mk/build.mk && echo 'compiler_CONFIGURE_OPTS += "-f-build-tool-depends"' \
mk/build.mk && echo 'utils/hpc_CONFIGURE_OPTS += "-f-build-tool-depends"' \
mk/build.mk && echo "HADDOCK_DOCS := NO" >> mk/build.mk \
&& echo "BUILD_SPHINX_HTML := NO" >> mk/build.mk && echo "BUILD_SPHINX_PDF := NO" \
mk/build.mk
# ./configure && make -j32
For newer versions, the build has to be performed with Hadrian, so the last step
# ./hadrian/build -j
Keep in mind that hadrian doesn't take into account 'mk/build.mk'. You
will have to configure hadrian in the same way, see also
https://www.haskell.org/ghc/blog/20220805-make-to-hadrian.html.


Let me summarize the current state to make sure we are not missing
anything:

1. GHC 9.0.2 with the native code generator is currently broken on
powerpc and segfaults while building (at least) haskell-random and
GHC-9.4.6. Strangely enough, we can compile GHC 9.0.2 itself.

2. An unregisterised GHC 9.0.2 is *also* broken on powerpc, producing
code that overflows integers. We are also seeing unregisterised GHC
9.4.6 on i386 being broken, since the tests for haskell-quickcheck fail
(see https://buildd.debian.org/status/package.php?p=haskell-quickcheck&suite=sid).
The plan for i386 is to registerise GHC and use the LLVM backend by
default (to avoid the baseline violation).

3. We cannot cross-compile GHC 9.4 and newer any more (we are discussing
this here as well https://gitlab.haskell.org/ghc/ghc/-/issues/23975).

Given the above, I can think of the following:

1. Fix the native code generator in GHC 9.0.2
2. Fix unregisterised GHCs on 32-bit architectures
3. Try and use the LLVM backend in GHC 9.0.2 on powerpc, and see if this
produces valid code and allows us to compile GHC 9.4.6.

For the record, I have started working on migrating GHC in Debian to use
the new Hadrian build system, you can find the latest code here
https://salsa.debian.org/haskell-team/DHG_packages/-/tree/hadrian. I am
at a really good state right now where I can build GHC, and doing a lot
of tests to verify I haven't missed anything. If you are working on GHC
right now as well, I would appreciate if you can take a look, and/or
start using this branch for all your tests, so we catch any errors
early.

Best,
--
Ilias
John Paul Adrian Glaubitz
2024-06-13 07:50:01 UTC
Permalink
Hi,

I finally figured out what the problem is.

After realizing that the two-stage build of GHC works without problems,
I realized it can be a configuration issue only and, indeed, it is.

Looking at /usr/lib/ghc/lib/settings, the default linker is set to gold:

"C compiler link flags", "-fuse-ld=gold"

Since gold is broken on powerpc and shouldn't really be used anymore since
it's basically unmaintained upstream, we must use bfd on powerpc by default.

Editing the file and switching back to bfd fixes the problem for me.

Now we just need to figure out how to actually set the default linker back
to bfd as it was actually explicitly supposed to happen according to
debian/rules.

This will most likely also unbreak GHC on m68k.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Jeffrey Walton
2024-06-13 08:00:01 UTC
Permalink
On Thu, Jun 13, 2024 at 3:41 AM John Paul Adrian Glaubitz
Post by John Paul Adrian Glaubitz
I finally figured out what the problem is.
After realizing that the two-stage build of GHC works without problems,
I realized it can be a configuration issue only and, indeed, it is.
"C compiler link flags", "-fuse-ld=gold"
Since gold is broken on powerpc and shouldn't really be used anymore since
it's basically unmaintained upstream, we must use bfd on powerpc by default.
Editing the file and switching back to bfd fixes the problem for me.
Now we just need to figure out how to actually set the default linker back
to bfd as it was actually explicitly supposed to happen according to
debian/rules.
This will most likely also unbreak GHC on m68k.
Good job, Adrian. That's quite a bit of work to track down the issue.

Jeff
John Paul Adrian Glaubitz
2024-06-13 08:20:01 UTC
Permalink
Hi Jeff,
Post by Jeffrey Walton
Post by John Paul Adrian Glaubitz
Now we just need to figure out how to actually set the default linker back
to bfd as it was actually explicitly supposed to happen according to
debian/rules.
This will most likely also unbreak GHC on m68k.
Good job, Adrian. That's quite a bit of work to track down the issue.
Thanks. In the meantime I filed a bug upstream for this [1].

I will actually open a second bug report since this bug report is about
the broken NGC on 32-bit PowerPC which is a different issue.

Adrian
Post by Jeffrey Walton
[1] https://gitlab.haskell.org/ghc/ghc/-/issues/24986
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Ilias Tsitsimpis
2024-06-13 18:40:01 UTC
Permalink
Hey Adrian,
Post by John Paul Adrian Glaubitz
Hi Jeff,
Post by John Paul Adrian Glaubitz
Now we just need to figure out how to actually set the default linker back
to bfd as it was actually explicitly supposed to happen according to
debian/rules.
This will most likely also unbreak GHC on m68k.
Great job!

I completely missed the fact this needs to be passes to the bindist's
configure script as well.

You need to edit debian/rules here
https://sources.debian.org/src/ghc/9.4.7-5/debian/rules/#L78
and add the following line as well:

+ EXTRA_INSTALL_CONFIGURE_FLAGS += --disable-ld-override

I will include that in the next upload. Do we still have to build an
unregisterised compiler for powerpc or can we switch back to NCG
(https://bugs.debian.org/1060196)?
--
Ilias
John Paul Adrian Glaubitz
2024-06-13 19:00:01 UTC
Permalink
Hi Ilias,
Post by Ilias Tsitsimpis
Great job!
Thanks!
Post by Ilias Tsitsimpis
I completely missed the fact this needs to be passes to the bindist's
configure script as well.
It took me forever to figure this out ;-).
Post by Ilias Tsitsimpis
You need to edit debian/rules here
https://sources.debian.org/src/ghc/9.4.7-5/debian/rules/#L78
+ EXTRA_INSTALL_CONFIGURE_FLAGS += --disable-ld-override
Yes, that's what I suggested, see my patch in [1].
Post by Ilias Tsitsimpis
I will include that in the next upload.
Great, thank you. I uploaded a patched version to unreleased in the
mean time.
Post by Ilias Tsitsimpis
Do we still have to build an unregisterised compiler for powerpc
or can we switch back to NCG (https://bugs.debian.org/1060196)?
I have not verified that yet. Please let's stay unregisterised for now
and have me verify first whether the NGC has been fixed with 9.6.x or
newer.

Please let's keep this bug report open and use #1073159 [1] for adding
the flag --disable-ld-override to EXTRA_INSTALL_CONFIGURE_FLAGS.

Adrian
Post by Ilias Tsitsimpis
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1073159
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz
2024-06-14 08:40:01 UTC
Permalink
Hi Ilias,
Post by John Paul Adrian Glaubitz
Post by Ilias Tsitsimpis
Do we still have to build an unregisterised compiler for powerpc
or can we switch back to NCG (https://bugs.debian.org/1060196)?
I have not verified that yet. Please let's stay unregisterised for now
and have me verify first whether the NGC has been fixed with 9.6.x or
newer.
Please let's keep this bug report open and use #1073159 [1] for adding
the flag --disable-ld-override to EXTRA_INSTALL_CONFIGURE_FLAGS.
GHC 9.6.5 still fails on powerpc with the NGC enabled:

# rts/include/rts/prof/LDV.h
| Run Ghc CompileHs Stage1: rts/HeapStackCheck.cmm => _build/stage1/rts/build/cmm/HeapStackCheck.debug_o
| Run Ghc CompileHs Stage1: rts/PrimOps.cmm => _build/stage1/rts/build/cmm/PrimOps.p_o
| Run Ghc CompileHs Stage1: rts/Updates.cmm => _build/stage1/rts/build/cmm/Updates.thr_p_o
| Run Ghc CompileHs Stage1: rts/Compact.cmm => _build/stage1/rts/build/cmm/Compact.thr_p_o
| Run Ghc CompileHs Stage1: rts/Exception.cmm => _build/stage1/rts/build/cmm/Exception.o
| Run Ghc CompileHs Stage1: rts/PrimOps.cmm => _build/stage1/rts/build/cmm/PrimOps.debug_o
| Run Ghc CompileHs Stage1: rts/StgStartup.cmm => _build/stage1/rts/build/cmm/StgStartup.debug_p_o
| Run Ghc CompileHs Stage1: rts/HeapStackCheck.cmm => _build/stage1/rts/build/cmm/HeapStackCheck.p_o
| Run Ghc CompileHs Stage1: rts/StgStartup.cmm => _build/stage1/rts/build/cmm/StgStartup.thr_debug_p_o
| Run Ghc CompileHs Stage1: rts/StgMiscClosures.cmm => _build/stage1/rts/build/cmm/StgMiscClosures.debug_o
| Run Ghc CompileHs Stage1: rts/StgMiscClosures.cmm => _build/stage1/rts/build/cmm/StgMiscClosures.p_o
| Run Ghc CompileHs Stage1: rts/ContinuationOps.cmm => _build/stage1/rts/build/cmm/ContinuationOps.debug_p_o
| Run Ghc CompileHs Stage1: rts/Updates.cmm => _build/stage1/rts/build/cmm/Updates.debug_p_o
| Run Ghc CompileHs Stage1: rts/StgStartup.cmm => _build/stage1/rts/build/cmm/StgStartup.thr_p_o
| Run Ghc CompileHs Stage1: rts/Apply.cmm => _build/stage1/rts/build/cmm/Apply.o
| Run Ghc CompileHs Stage1: rts/Exception.cmm => _build/stage1/rts/build/cmm/Exception.debug_o
| Run Ghc CompileHs Stage1: rts/Apply.cmm => _build/stage1/rts/build/cmm/Apply.thr_dyn_o
Command line: _build/stage0/bin/ghc -Wall -Wcompat -hisuf debug_p_hi -osuf debug_p_o -hcsuf debug_p_hc -static -prof -DDEBUG -optc-DDEBUG -hide-all-packages -no-user-package-db '-package-env -' '-
package-db _build/stage1/inplace/package.conf.d' '-this-unit-id rts-1.0.2' -i -i/home/glaubitz/ghc-deb-9.6.5-new/ghc-9.6.5/_build/stage1/rts/build -i/home/glaubitz/ghc-deb-9.6.5-new/ghc-
9.6.5/_build/stage1/rts/build/autogen -i/home/glaubitz/ghc-deb-9.6.5-new/ghc-9.6.5/rts -Irts/include -I_build/stage1/rts/build -I_build/stage1/rts/build/include
-I_build/stage1/rts/build/@FFIIncludeDir@ -I_build/stage1/rts/build/@LibdwIncludeDir@ -Irts/include -Irts/@FFIIncludeDir@ -Irts/@LibdwIncludeDir@ -optP-include -
optP_build/stage1/rts/build/autogen/cabal_macros.h -ghcversion-file=rts/include/ghcversion.h -outputdir _build/stage1/rts/build -fdiagnostics-color=always -Wnoncanonical-monad-instances -optc-Wno-
error=inline -optP-Wno-nonportable-include-path -c rts/ContinuationOps.cmm -o _build/stage1/rts/build/cmm/ContinuationOps.debug_p_o -O2 -H32m -this-unit-id rts -XHaskell98 -no-global-package-db -
package-db=/home/glaubitz/ghc-deb-9.6.5-new/ghc-9.6.5/_build/stage1/inplace/package.conf.d -ghcversion-file=rts/include/ghcversion.h -ghcversion-file=rts/include/ghcversion.h -haddock -Irts -
I_build/stage1/rts/build '-DRtsWay="rts_debug_p"' -DFS_NAMESPACE=rts -DCOMPILING_RTS -DPROFILING -Wno-deprecated-flags -Wcpp-undef
===> Command failed with error code: 1
ghc: panic! (the 'impossible' happened)
GHC version 9.6.5:
PPC.Ppr.pprInstr: JMP to ForeignLabel
CallStack (from HasCallStack):
panic, called at compiler/GHC/CmmToAsm/PPC/Ppr.hs:591:30 in ghc:GHC.CmmToAsm.PPC.Ppr


Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug

Error when running Shake build system:
at want, called at src/Main.hs:124:44 in main:Main
* Depends on: binary-dist-dir
at need, called at src/Rules/BinaryDist.hs:130:9 in main:Rules.BinaryDist
* Depends on: _build/stage1/lib/package.conf.d/rts-1.0.2.conf
at need, called at src/Rules/Register.hs:134:5 in main:Rules.Register
* Depends on: _build/stage1/rts/build/stamp-rts-1.0.2_debug_p
at need, called at src/Rules/Library.hs:144:3 in main:Rules.Library
* Depends on: _build/stage1/rts/build/libHSrts-1.0.2_debug_p.a
at need, called at src/Rules/Library.hs:220:5 in main:Rules.Library
* Depends on: _build/stage1/rts/build/cmm/ContinuationOps.debug_p_o
at cmd', called at src/Builder.hs:387:23 in main:Builder
at cmdArgs, called at src/Builder.hs:540:8 in main:Builder
at cmdArgs, called at src/Builder.hs:564:18 in main:Builder
at cmdArgs, called at src/Builder.hs:564:18 in main:Builder
at error, called at src/Builder.hs:609:13 in main:Builder
* Raised the exception:
Command failed

Build failed.

Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Loading...