Discussion:
[OpenAFS-devel] OpenAFS release team weekly meeting
Michael Meffie
2018-01-09 18:24:00 UTC
Permalink
OpenAFS release team weekly meeting

Date: Jan 5, 2018
Participants:
* Stephan Wiesand - Release Manager 1.6
* Ben Kaduk (Release Manager 1.8)
* Mark Vitale

The weekly release team meetings are held on Fridays at 14:00 GMT on jabber
(xmpp) in release-***@conference.openafs.org. Please request login
information if you would like to participate. Logs are available at
https://conference.openafs.org/release-***@conference.openafs.org/

1.6.x stable series
===================

1.6.22.2
--------

The next planned 1.6.x release will be 1.6.22.2 release to support
linux 4.15:

- macos high sierra support
- linux 4.15 support
- changes to address getcwd() issues on RHEL7.4 (now merged on master)
- rpm specfile changes to build kernel-debuginfo package

(The rpm specfile changes may be deferred.)


1.6.23pre1
----------

Candidates for 1.6.23pre1

Rx fix already on master/1.8:

7784 rx: rxi_ReceiveDataPacket do not set rprev on drop

Fixes for ubik quorum lose issues:

12803 ubik: avoid early DISK_Begin calls we know will fail
12806 ubik: update epoch as soon as sync-site is elected
12807 ubik: remove useless signal call
12811 ubik: allow remote dbase relabel if up to date

Other candidates, backported from 1.8.x/master. Reviews
for 1.6.x branch are needed.

12684 libafs: avoid resetting the dynroot volume every 10 minutes
12685 libafs: rename volume accessTime to setupTime
12686 libafs: update the volume setup time when the vldb is rechecked
12687 libafs: vldb cache timeout option (-volume-ttl)

12667 afs: fix afs_xserver deadlock in afsdb refresh
12645 Put jhutz's ubik analysis in doc/txt
12646 doc: Add introduction and credits to ubik.txt
12666 Linux: fix whitespace in osi_sysctl.c
12643 afs: Improve "Corrupt directory" warning

Other candidates may be proposed. Reviews are welcome.

Note, the recent change on master which is not a candidate for 1.6.x
(this is a build fix for 1.8.x):

12853 rx: remove trailing semicolons from FBSD mutex operations


1.8.x series
============

1.8.0pre4 (beta) was released (shortly after the meeting). This is considered
the final beta release. Test reports are appreciated. The next release will be
a release candidate. Known items to be addressed in the next release:

* the FreeBSD build fix (12853) and new sysnames for FreeBSD
* autoconf refactoring changes
--
Michael Meffie <***@sinenomine.net>
Stephan Wiesand
2018-01-10 20:40:35 UTC
Permalink
Thanks a lot Mike!

Ben asked for links to changes or change stacks to ease review. So, inline, together with a few comments
Post by Michael Meffie
OpenAFS release team weekly meeting
Date: Jan 5, 2018
* Stephan Wiesand - Release Manager 1.6
* Ben Kaduk (Release Manager 1.8)
* Mark Vitale
The weekly release team meetings are held on Fridays at 14:00 GMT on jabber
information if you would like to participate. Logs are available at
1.6.x stable series
===================
1.6.22.2
--------
The next planned 1.6.x release will be 1.6.22.2 release to support
- macos high sierra support
https://gerrit.openafs.org/#/q/status:open+project:openafs+branch:openafs-stable-1_6_x+topic:highsierra
Post by Michael Meffie
- linux 4.15 support
https://gerrit.openafs.org/#/q/status:open+project:openafs+branch:openafs-stable-1_6_x+topic:Linux-4.15
Post by Michael Meffie
- changes to address getcwd() issues on RHEL7.4 (now merged on master)
https://gerrit.openafs.org/#/q/status:open+project:openafs+branch:openafs-stable-1_6_x+topic:RH74_shakeloose_refactor
Post by Michael Meffie
- rpm specfile changes to build kernel-debuginfo package
https://gerrit.openafs.org/12658
https://gerrit.openafs.org/12818
Post by Michael Meffie
(The rpm specfile changes may be deferred.)
1.6.23pre1
----------
Candidates for 1.6.23pre1
7784 rx: rxi_ReceiveDataPacket do not set rprev on drop
Meanwhile pulled up, with two variants as discussed in the meeting:

https://gerrit.openafs.org/#/q/status:open+project:openafs+branch:openafs-stable-1_6_x+topic:7784-v1

https://gerrit.openafs.org/#/q/status:open+project:openafs+branch:openafs-stable-1_6_x+topic:7784-v2
Post by Michael Meffie
12803 ubik: avoid early DISK_Begin calls we know will fail
12806 ubik: update epoch as soon as sync-site is elected
12807 ubik: remove useless signal call
12811 ubik: allow remote dbase relabel if up to date
https://gerrit.openafs.org/#/q/status:open+project:openafs+branch:openafs-stable-1_6_x+topic:ubik
Post by Michael Meffie
Other candidates, backported from 1.8.x/master. Reviews
for 1.6.x branch are needed.
12684 libafs: avoid resetting the dynroot volume every 10 minutes
12685 libafs: rename volume accessTime to setupTime
12686 libafs: update the volume setup time when the vldb is rechecked
12687 libafs: vldb cache timeout option (-volume-ttl)
12667 afs: fix afs_xserver deadlock in afsdb refresh
12645 Put jhutz's ubik analysis in doc/txt
12646 doc: Add introduction and credits to ubik.txt
12666 Linux: fix whitespace in osi_sysctl.c
12643 afs: Improve "Corrupt directory" warning
Other candidates may be proposed. Reviews are welcome.
Thanks again,
Stephan
Post by Michael Meffie
Note, the recent change on master which is not a candidate for 1.6.x
12853 rx: remove trailing semicolons from FBSD mutex operations
1.8.x series
============
1.8.0pre4 (beta) was released (shortly after the meeting). This is considered
the final beta release. Test reports are appreciated. The next release will be
* the FreeBSD build fix (12853) and new sysnames for FreeBSD
* autoconf refactoring changes
--
Stephan Wiesand
DESY - DV -
Platanenallee 6
15738 Zeuthen, Germany
m***@gmail.com
2018-01-12 18:37:43 UTC
Permalink
Post by Michael Meffie
1.8.x series
============
1.8.0pre4 (beta) was released (shortly after the meeting). This is considered
the final beta release. Test reports are appreciated. The next release will be
* the FreeBSD build fix (12853) and new sysnames for FreeBSD
* autoconf refactoring changes
Openafs kernel module compiled from git using version: 1.8.0pre4

emerge --info

Portage 2.3.13 (python 2.7.14-final-0,
default/linux/amd64/17.0/desktop/gnome/systemd, gcc-7.2.0, glibc-2.26-r5,
4.15.0-rc7 x86_64)
=================================================================
System uname: Linux-4.15.0-rc7-x86_64-Intel-R-_Xeon-R-***@_2.40GHz-with-
gentoo-2.4.1
KiB Mem: 12353744 total, 8054304 free
KiB Swap: 12582908 total, 12582908 free
sh bash 4.4_p12
ld GNU ld (Gentoo 2.29.1 p3) 2.29.1
app-shells/bash: 4.4_p12::gentoo
dev-lang/perl: 5.24.3::gentoo
dev-lang/python: 2.7.14-r1::gentoo, 3.4.5-r1::gentoo, 3.5.4-r1::gentoo,
3.6.3-r1::gentoo
dev-util/cmake: 3.8.2::gentoo
dev-util/pkgconfig: 0.29.2::gentoo
sys-apps/baselayout: 2.4.1-r2::gentoo
sys-apps/openrc: 0.34.11::gentoo
sys-apps/sandbox: 2.10-r4::gentoo
sys-devel/autoconf: 2.13::gentoo, 2.69-r4::gentoo
sys-devel/automake: 1.11.6-r2::gentoo, 1.12.6::gentoo, 1.14.1::gentoo,
1.15.1-r1::gentoo
sys-devel/binutils: 2.29.1-r1::gentoo
sys-devel/gcc: 6.3.0::gentoo, 7.2.0-r1::gentoo
sys-devel/gcc-config: 1.8-r1::gentoo
sys-devel/libtool: 2.4.6-r3::gentoo
sys-devel/make: 4.2.1::gentoo
sys-kernel/linux-headers: 4.14::gentoo (virtual/os-headers)
sys-libs/glibc: 2.26-r5::gentoo

kernel: afs: disk cache read error in CacheItems slot 69050 off 5524020/8000020
code -4/80
kernel: openafs: afs_InvalidateAllSegments tdc count
kernel: ------------[ cut here ]------------
kernel: Kernel BUG at 0000000080540eff [verbose debug info unavailable]
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: libafs(PO) mcryptd sha256_ssse3 sha256_generic
cfg80211 cbc rbd libceph iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp xt_physdev
br_netfilter nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
libcrc32c crc32c_generic iptable_filter ip_tables x_tables nf_tables nfnetlink
bridge stp llc mousedev joydev hid_logitech_hidpp snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi hp_wmi snd_hda_intel sparse_keymap
snd_hda_codec gpio_ich rfkill wmi_bmof psmouse snd_hda_core intel_powerclamp
pcspkr snd_pcm wmi snd_timer rtc_cmos snd evdev usbmouse hid_logitech_dj
input_leds acpi_cpufreq lpc_ich soundcore i7core_edac button sch_fq_codel
kyber_iosched bfq vhost_net vhost tap tun kvm_intel kvm irqbypass smsc47b397
coretemp hid_generic usbkbd btrfs usbhid xor zstd_decompress
kernel: zstd_compress xxhash raid6_pq sr_mod sd_mod cdrom amdgpu uhci_hcd chash
i2c_algo_bit backlight drm_kms_helper cfbfillrect syscopyarea cfbimgblt
sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev ahci ttm libahci
crc32c_intel atkbd libata tg3 drm serio_raw ehci_pci firewire_ohci ehci_hcd ptp
scsi_mod firewire_core pps_core usbcore libphy crc_itu_t agpgart hwmon i2c_core
floppy unix ipv6 autofs4
kernel: CPU: 15 PID: 91713 Comm: tracker-store Tainted:
P IO 4.15.0-rc7 #1
kernel: Hardware name: Hewlett-Packard HP Z600 Workstation/0AE8h, BIOS 786G4
v03.19 03/11/2011
kernel: RIP: 0010:afs_InvalidateAllSegments+0x42e/0x430 [libafs]
kernel: RSP: 0000:ffffc9000b1f3e28 EFLAGS: 00010292
kernel: RAX: 000000000000002c RBX: 0000000000000001 RCX: ffffffff81c3eb98
kernel: RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff81f79584
kernel: RBP: ffff8802fff63740 R08: 0000000000001518 R09: ffffffff81f7b9c2
kernel: R10: ffff8801a9807000 R11: 0000000000000000 R12: 0000000000010dba
kernel: R13: 0000000000000000 R14: 00000000000006d2 R15: 0000000000000000
kernel: FS: 00007ff04f68f7c0(0000) GS:ffff88032fdc0000(0000)
knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f91fc057630 CR3: 00000001390c1000 CR4: 00000000000026e0
kernel: Call Trace:
kernel: afs_StoreAllSegments+0x584/0xbe0 [libafs]
kernel: afs_linux_flush+0x482/0x500 [libafs]
kernel: filp_close+0x22/0x70
kernel: SyS_close+0x1a/0x40
kernel: entry_SYSCALL_64_fastpath+0x13/0x6c
kernel: RIP: 0033:0x7ff04e2d2910
kernel: RSP: 002b:00007ffcc2324530 EFLAGS: 00000293
kernel: Code: 48 c7 c7 a0 82 bd a0 e8 51 9e ff ff e9 79 ff ff ff 48 c7 c7 e0 d4
bc a0 e8 15 68 54 e0 0f 0b 48 c7 c7 b0 d4 bc a0 e8 07 68 54 e0 <0f> 0b 41 57 41
56 41 55 41 54 55 53 48 89 fb 48 81 c7 b0 02 00
kernel: RIP: afs_InvalidateAllSegments+0x42e/0x430 [libafs] RSP:
ffffc9000b1f3e28
kernel: ---[ end trace 709a142c7fd521a4 ]---
Mark Vitale
2018-01-14 20:33:49 UTC
Permalink
Markus,
Post by m***@gmail.com
Openafs kernel module compiled from git using version: 1.8.0pre4
emerge --info
Portage 2.3.13 (python 2.7.14-final-0,
default/linux/amd64/17.0/desktop/gnome/systemd, gcc-7.2.0, glibc-2.26-r5,
4.15.0-rc7 x86_64)
=================================================================
kernel: afs: disk cache read error in CacheItems slot 69050 off 5524020/8000020 code -4/80
Code -4 is EINTR (interrupt) during the read.
Since this happened in afs_GetValidDSlot (afs_UFSGetDSlot), it returns NULL for the dcache (tdc).
Post by m***@gmail.com
kernel: openafs: afs_InvalidateAllSegments tdc count
kernel: ------------[ cut here ]------------
kernel: Kernel BUG at 0000000080540eff [verbose debug info unavailable]
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: libafs(PO) mcryptd sha256_ssse3 sha256_generic
cfg80211 cbc rbd libceph iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp xt_physdev
br_netfilter nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
libcrc32c crc32c_generic iptable_filter ip_tables x_tables nf_tables nfnetlink
bridge stp llc mousedev joydev hid_logitech_hidpp snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi hp_wmi snd_hda_intel sparse_keymap
snd_hda_codec gpio_ich rfkill wmi_bmof psmouse snd_hda_core intel_powerclamp
pcspkr snd_pcm wmi snd_timer rtc_cmos snd evdev usbmouse hid_logitech_dj
input_leds acpi_cpufreq lpc_ich soundcore i7core_edac button sch_fq_codel
kyber_iosched bfq vhost_net vhost tap tun kvm_intel kvm irqbypass smsc47b397
coretemp hid_generic usbkbd btrfs usbhid xor zstd_decompress
kernel: zstd_compress xxhash raid6_pq sr_mod sd_mod cdrom amdgpu uhci_hcd chash
i2c_algo_bit backlight drm_kms_helper cfbfillrect syscopyarea cfbimgblt
sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev ahci ttm libahci
crc32c_intel atkbd libata tg3 drm serio_raw ehci_pci firewire_ohci ehci_hcd ptp
scsi_mod firewire_core pps_core usbcore libphy crc_itu_t agpgart hwmon i2c_core
floppy unix ipv6 autofs4
P IO 4.15.0-rc7 #1
kernel: Hardware name: Hewlett-Packard HP Z600 Workstation/0AE8h, BIOS 786G4
v03.19 03/11/2011
kernel: RIP: 0010:afs_InvalidateAllSegments+0x42e/0x430 [libafs]
kernel: RSP: 0000:ffffc9000b1f3e28 EFLAGS: 00010292
kernel: RAX: 000000000000002c RBX: 0000000000000001 RCX: ffffffff81c3eb98
kernel: RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff81f79584
kernel: RBP: ffff8802fff63740 R08: 0000000000001518 R09: ffffffff81f7b9c2
kernel: R10: ffff8801a9807000 R11: 0000000000000000 R12: 0000000000010dba
kernel: R13: 0000000000000000 R14: 00000000000006d2 R15: 0000000000000000
kernel: FS: 00007ff04f68f7c0(0000) GS:ffff88032fdc0000(0000)
knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f91fc057630 CR3: 00000001390c1000 CR4: 00000000000026e0
kernel: afs_StoreAllSegments+0x584/0xbe0 [libafs]
kernel: afs_linux_flush+0x482/0x500 [libafs]
kernel: filp_close+0x22/0x70
kernel: SyS_close+0x1a/0x40
kernel: entry_SYSCALL_64_fastpath+0x13/0x6c
kernel: RIP: 0033:0x7ff04e2d2910
kernel: RSP: 002b:00007ffcc2324530 EFLAGS: 00000293
kernel: Code: 48 c7 c7 a0 82 bd a0 e8 51 9e ff ff e9 79 ff ff ff 48 c7 c7 e0 d4
bc a0 e8 15 68 54 e0 0f 0b 48 c7 c7 b0 d4 bc a0 e8 07 68 54 e0 <0f> 0b 41 57 41
56 41 55 41 54 55 53 48 89 fb 48 81 c7 b0 02 00
ffffc9000b1f3e28
kernel: —[ end trace 709a142c7fd521a4 ]---
How many times have you seen this problem? Are you able to reproduce it at will?

What is the backend filesystem for your AFS cache partition?
Is it possible it was slow or hung at the time, leading someone to try
an interrupt to free the hang? Could you share the syslog that precedes
the panic?

Are you able to share the core file from this panic?
If not, would you be willing to examine it with the ‘crash’ utiltity
and provide the backtraces from the other OpenAFS kernel threads at
the time of the crash?



Regards,

Mark Vitale
OpenAFS release team

:��T�z���x%��N���'^��h���~�+
m***@gmail.com
2018-01-14 22:45:52 UTC
Permalink
Post by Mark Vitale
Markus,
Post by m***@gmail.com
Openafs kernel module compiled from git using version: 1.8.0pre4
emerge --info
Portage 2.3.13 (python 2.7.14-final-0,
default/linux/amd64/17.0/desktop/gnome/systemd, gcc-7.2.0, glibc-2.26-r5,
4.15.0-rc7 x86_64)
=================================================================
kernel: afs: disk cache read error in CacheItems slot 69050 off 5524020/8000020 code -4/80
Code -4 is EINTR (interrupt) during the read.
Since this happened in afs_GetValidDSlot (afs_UFSGetDSlot), it returns NULL for the dcache (tdc).
Post by m***@gmail.com
kernel: openafs: afs_InvalidateAllSegments tdc count
kernel: ------------[ cut here ]------------
kernel: Kernel BUG at 0000000080540eff [verbose debug info unavailable]
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: libafs(PO) mcryptd sha256_ssse3 sha256_generic
cfg80211 cbc rbd libceph iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp xt_physdev
br_netfilter nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
libcrc32c crc32c_generic iptable_filter ip_tables x_tables nf_tables nfnetlink
bridge stp llc mousedev joydev hid_logitech_hidpp snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi hp_wmi snd_hda_intel sparse_keymap
snd_hda_codec gpio_ich rfkill wmi_bmof psmouse snd_hda_core intel_powerclamp
pcspkr snd_pcm wmi snd_timer rtc_cmos snd evdev usbmouse hid_logitech_dj
input_leds acpi_cpufreq lpc_ich soundcore i7core_edac button sch_fq_codel
kyber_iosched bfq vhost_net vhost tap tun kvm_intel kvm irqbypass smsc47b397
coretemp hid_generic usbkbd btrfs usbhid xor zstd_decompress
kernel: zstd_compress xxhash raid6_pq sr_mod sd_mod cdrom amdgpu uhci_hcd chash
i2c_algo_bit backlight drm_kms_helper cfbfillrect syscopyarea cfbimgblt
sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev ahci ttm libahci
crc32c_intel atkbd libata tg3 drm serio_raw ehci_pci firewire_ohci ehci_hcd ptp
scsi_mod firewire_core pps_core usbcore libphy crc_itu_t agpgart hwmon i2c_core
floppy unix ipv6 autofs4
P IO 4.15.0-rc7 #1
kernel: Hardware name: Hewlett-Packard HP Z600 Workstation/0AE8h, BIOS 786G4
v03.19 03/11/2011
kernel: RIP: 0010:afs_InvalidateAllSegments+0x42e/0x430 [libafs]
kernel: RSP: 0000:ffffc9000b1f3e28 EFLAGS: 00010292
kernel: RAX: 000000000000002c RBX: 0000000000000001 RCX: ffffffff81c3eb98
kernel: RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff81f79584
kernel: RBP: ffff8802fff63740 R08: 0000000000001518 R09: ffffffff81f7b9c2
kernel: R10: ffff8801a9807000 R11: 0000000000000000 R12: 0000000000010dba
kernel: R13: 0000000000000000 R14: 00000000000006d2 R15: 0000000000000000
kernel: FS: 00007ff04f68f7c0(0000) GS:ffff88032fdc0000(0000)
knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f91fc057630 CR3: 00000001390c1000 CR4: 00000000000026e0
kernel: afs_StoreAllSegments+0x584/0xbe0 [libafs]
kernel: afs_linux_flush+0x482/0x500 [libafs]
kernel: filp_close+0x22/0x70
kernel: SyS_close+0x1a/0x40
kernel: entry_SYSCALL_64_fastpath+0x13/0x6c
kernel: RIP: 0033:0x7ff04e2d2910
kernel: RSP: 002b:00007ffcc2324530 EFLAGS: 00000293
kernel: Code: 48 c7 c7 a0 82 bd a0 e8 51 9e ff ff e9 79 ff ff ff 48 c7 c7 e0 d4
bc a0 e8 15 68 54 e0 0f 0b 48 c7 c7 b0 d4 bc a0 e8 07 68 54 e0 <0f> 0b 41 57 41
56 41 55 41 54 55 53 48 89 fb 48 81 c7 b0 02 00
ffffc9000b1f3e28
kernel: —[ end trace 709a142c7fd521a4 ]---
How many times have you seen this problem? Are you able to reproduce it at will?
I have seen this a few times. It seems that it happends only
a) when I'am logging out of gnome (in my case wayland) session
b) when I'am unlocking gnome screensaver
c) when I'am changing virtual terminal
Post by Mark Vitale
What is the backend filesystem for your AFS cache partition?
AFS cache partition is btrfs subvolume,
cacheinfo: /afs:/mnt/ssd/openafs_cache:20000000
Post by Mark Vitale
Is it possible it was slow or hung at the time, leading someone to try
an interrupt to free the hang? Could you share the syslog that precedes
the panic?
My home directory is under afs and there is other afs volumes mounted also:
/afs/my_chell/user/my_home_dir
/afs/my_chell/user/my_home_dir/another_afs_volume

Maybe DAFS file server has detaches unused volume "another_afs_volume"
and some gnome process goes mad when volume attach takes long time?
(Some of my /vicep are hosted under CEPH -> slow attach time sometimes)

Sorry, systemd has destroyed logs already
Post by Mark Vitale
Are you able to share the core file from this panic?
If not, would you be willing to examine it with the ‘crash’ utiltity
and provide the backtraces from the other OpenAFS kernel threads at
the time of the crash?
Sorry, coredumpctl didn't catch any core file.

-Markus
Harald Barth
2018-01-15 09:05:25 UTC
Permalink
Post by m***@gmail.com
Post by Mark Vitale
What is the backend filesystem for your AFS cache partition?
AFS cache partition is btrfs subvolume,
cacheinfo: /afs:/mnt/ssd/openafs_cache:20000000
Can you put your cache in ext[234] and try again?
Post by m***@gmail.com
Sorry, systemd has destroyed logs already
So the systemd craze has reached gentoo as well.

Harald.
m***@gmail.com
2018-01-15 18:26:16 UTC
Permalink
Post by Harald Barth
Post by m***@gmail.com
Post by Mark Vitale
What is the backend filesystem for your AFS cache partition?
AFS cache partition is btrfs subvolume,
cacheinfo: /afs:/mnt/ssd/openafs_cache:20000000
Can you put your cache in ext[234] and try again?
I don't have any partitios available but I can give up
swap and format that partition to ext4 and give a try.
Post by Harald Barth
Post by m***@gmail.com
Sorry, systemd has destroyed logs already
So the systemd craze has reached gentoo as well.
Yep, that craze is supported also :)

-Markus
Stephan Wiesand
2018-01-15 18:39:00 UTC
Permalink
Post by m***@gmail.com
Post by Harald Barth
Post by m***@gmail.com
Post by Mark Vitale
What is the backend filesystem for your AFS cache partition?
AFS cache partition is btrfs subvolume,
cacheinfo: /afs:/mnt/ssd/openafs_cache:20000000
Can you put your cache in ext[234] and try again?
I don't have any partitios available but I can give up
swap and format that partition to ext4 and give a try.
Removing all swap space may cause other "interesting" issues. If you have sufficient space left on some filesystem, you could instead create a file with a fileystem on it and loop mount that:

dd if=/dev/zero of=/where/space/left/image bs=1M count=2048
mkfs.ext[234] /where/space/left/image
mount -oloop /where/space/left/image /mnt/openafs_cache

Worst case you can still make that file sparse, as long as the fs has sufficient space to lodge the actually required space.
Post by m***@gmail.com
Post by Harald Barth
Post by m***@gmail.com
Sorry, systemd has destroyed logs already
So the systemd craze has reached gentoo as well.
Yep, that craze is supported also :)
Let's face it, systemd is upon us, and it's not that bad. What is bad is the way it's used (configured) in current distros.
Harald Barth
2018-01-16 08:50:57 UTC
Permalink
Post by Stephan Wiesand
Post by m***@gmail.com
I don't have any partitios available but I can give up
swap and format that partition to ext4 and give a try.
Removing all swap space may cause other "interesting" issues. If you
have sufficient space left on some filesystem, you could instead
Well, with the amounts of RAM today, most workloads don't need swap.
If your process exceeds your RAM, there is often something wrong with
it and it's better it terminates right away instead of first slowing
down the whole machine to a crawl. PDC's compute nodes run without
swap and we have seen no problems from that. If you have a firefox
with 600 tabs (I recently read an error report about that) of course
your mileage may differ ;-) ;-)
Post by Stephan Wiesand
dd if=/dev/zero of=/where/space/left/image bs=1M count=2048
mkfs.ext[234] /where/space/left/image
mount -oloop /where/space/left/image /mnt/openafs_cache
Yes, that's of course a good way to test as well.
Post by Stephan Wiesand
Let's face it, systemd is upon us, and it's not that bad. What is bad is the way it's used (configured) in current distros.
Well, I don't know if I should compain about systemd or to the folks
who wrote the rule how systemd should start openafs. Result was that
systemd restarted the openafs service endlessly until there were so
many mounts on /afs that RAM was exhausted (oops ;-)

Harald.
m***@gmail.com
2018-01-16 19:13:36 UTC
Permalink
Post by m***@gmail.com
I don't have any partitios available but I can give up
swap and format that partition to ext4 and give a try.
This time cache partition file system is ext4. Shortened version of journalctl:
https://drive.google.com/file/d/1GTc3ZxnILxIbyroN_mlbeLs72O66YQ_y/view?usp=shari
ng

It seems that I can reproduce this quite easily. I just log out gnome session.


-Markus
Mark Vitale
2018-01-16 20:12:03 UTC
Permalink
Post by m***@gmail.com
Post by m***@gmail.com
I don't have any partitios available but I can give up
swap and format that partition to ext4 and give a try.
https://drive.google.com/file/d/1GTc3ZxnILxIbyroN_mlbeLs72O66YQ_y/view?usp=shari
ng
It seems that I can reproduce this quite easily. I just log out gnome session.
Thanks for this. From your journalctl excerpt:

tammi 16 20:31:45 z600.station.com systemd[3023]: Received SIGRTMIN+24 from PID 3448 (kill).
tammi 16 20:31:45 z600.station.com tracker-miner-fs.desktop[2885]: OK
tammi 16 20:31:45 z600.station.com gdm-password][2403]: pam_unix(gdm-password:session): session closed for user masu
tammi 16 20:31:45 z600.station.com systemd[1]: Stopped User Manager for UID 0.
tammi 16 20:31:45 z600.station.com systemd[1]: Removed slice User Slice of root.
tammi 16 20:31:45 z600.station.com kernel: afs: disk cache read error in CacheItems slot 98197 off 7855780/8000020 code -4/80
tammi 16 20:31:45 z600.station.com kernel: openafs: afs_InvalidateAllSegments tdc count
<panic>


it seems quite likely that SIGRTMIN+24 is the signal that interrupted the OpenAFS disk cache read (code -4 is EINTR). I’ve seen cases before where user-defined signals have caused trouble for OpenAFS, but I can’t rememeber the details at the moment, nor what was done to solve the problem. But I imagine that your environment requires this signal, and thus OpenAFS may need mods to cope with it.


Mark Vitale
OpenAFS release team

���'^��fj)b� b�өzpIׯzZ)zv����
Harald Barth
2018-01-17 09:06:09 UTC
Permalink
Post by Mark Vitale
tammi 16 20:31:45 z600.station.com kernel: afs: disk cache read error in CacheItems slot 98197 off 7855780/8000020 code -4/80
This sounded familiar. Hm. Found it:
https://lists.openafs.org/pipermail/openafs-info/2013-October/040215.html
https://rt.central.org/rt/Ticket/Display.html?id=131747&user=guest&pass=guest

But I don't know what happend since 2013.

So what's going on here?

* Every login starts a systemd --user process (even root)? By whom?
(obviously every login needs its systemd :-/)
* When logging out / ending that session (?) that systemd --user is terminated with SIGRTMIN+24?
* When getting this signal, process dies (of course) and disk cache read is interrupted?
* Kernel panics on interrupted read?

Harald.
Peter Gille
2018-01-17 09:44:06 UTC
Permalink
Post by Harald Barth
So what's going on here?
Answering your systemd questions at least.
Post by Harald Barth
* Every login starts a systemd --user process
No, it starts a systemd --user if one is not already started for the
user that just logged in. It will then close that process on the last
logout. So at any time there should be one systemd --user per $USER.

Also, I *think* it's configurable if you want it to start only on local
logins or also on ssh logins etc.
Post by Harald Barth
(even root)?
I think this is configurable, but I'm not sure. By default yes at least.
Post by Harald Barth
By whom?
pam-systemd notifies logind and they then collaborate on setting up the
user session (it also sets up cgroups etc). See man 8 pam_systemd.

If it fails for whatever reason it will time out after a fairly short
time and allow login anyway (at least according to my experience).
Post by Harald Barth
(obviously every login needs its systemd :-/)
To start user services that want to be started on login (mpd, mail
fetch, pulseaudio, various desktop environment processes, ...) it can be
practical, yes. Probably you don't want to have many services like this
triggered for root, though, but that's up to you as the root user to
decide.
Post by Harald Barth
* When logging out / ending that session (?) that systemd --user is
terminated with SIGRTMIN+24?
See first answer. It will close on the last end of a login session for
that user. This might also tear down other processes in the users
cgroup.

See the 'loginctl enable-linger $user' option, and KillUserProcesses= in
logind.conf(5).
Post by Harald Barth
Harald.
Cheers,
Peter
Mark Vitale
2018-01-19 20:36:36 UTC
Permalink
Post by m***@gmail.com
Post by m***@gmail.com
I don't have any partitios available but I can give up
swap and format that partition to ext4 and give a try.
https://drive.google.com/file/d/1GTc3ZxnILxIbyroN_mlbeLs72O66YQ_y/view?usp=shari
ng
Could you pleaes provide a copy of config.log from your OpenAFS build?

Thanks,

--
Mark Vitale
***@sinenomine.net
m***@gmail.com
2018-01-21 14:05:27 UTC
Permalink
Post by Mark Vitale
Could you pleaes provide a copy of config.log from your OpenAFS build?
cofig.log

https://drive.google.com/file/d/1WOPuOY5g9YLj5EqFjk0acXZMAA_-qMES/view?
usp=sharing

journal of last minute:

https://drive.google.com/file/d/1ZsdDOi6PdUax0gMGxk5qNfLeF1MV6_5R/view?
usp=sharing

This time it survived multible log outs but when I add some bacground
work "compile kernel with -j16 in my afs home directory" and log out it
crash.

-Markus

Benjamin Kaduk
2018-01-17 04:13:13 UTC
Permalink
Post by Harald Barth
Well, I don't know if I should compain about systemd or to the folks
who wrote the rule how systemd should start openafs. Result was that
Perhaps this is needlessly pedantic, but there is not a single one
"rule" for how systemd should start the OpenAFS client -- the Red
Hat and Debian packaging are quite diverged in this regard. (As
Debian maintainer for openafs, I welcome Debian bug reports for
issues in this area or others.)

-Ben
Post by Harald Barth
systemd restarted the openafs service endlessly until there were so
many mounts on /afs that RAM was exhausted (oops ;-)
Michael Meffie
2018-01-16 16:32:02 UTC
Permalink
OpenAFS release team weekly meeting

Date: Jan 12, 2018
Participants:
* Stephan Wiesand (Release Manager 1.6)
* Ben Kaduk (Release Manager 1.8)
* Michael Meffie
* Marcio Barbosa

The weekly release team meetings are held on Fridays at 14:00 GMT on jabber
(xmpp) in release-***@conference.openafs.org. Please request login
information if you would like to participate. Logs are available at
https://conference.openafs.org/release-***@conference.openafs.org/

1.6.x stable series
===================

NOTE:

A discussion was held on what the requirements are for merging commits on the
stable branch. Gerrits require +1 to be merged at a minimum. Gerrits changing
difficult areas, such as ubik, require multiple +1s from developers familiar
with that area of the code, even after being merged on the master branch.
Reviewers should note if the changes are back-ported correctly and are
appropriate for the stable branch.


1.6.22.2
--------

The next planned 1.6.x release will be 1.6.22.2 release to support
linux 4.15:

- macos high sierra support
topic:highsierra

12833 macos: make the OpenAFS client aware of APFS
12832 macos: packaging support for MacOS X 10.13
12831 macos: add support for MacOS 10.13

- linux 4.15 support
topic:Linux-4.15

12834 Linux: use plain page_cache_alloc
12835 Linux 4.15: check for 2nd argument to pagevec_init

- changes to address getcwd() issues on RHEL7.4
topic:RH74_shakeloose_refactor

12859 LINUX: consolidate duplicate code in osi_TryEvictDentries
12860 LINUX: Avoid d_invalidate() during afs_ShakeLooseVCaches()
12858 LINUX: consolidate duplicate code in canonical_dentry
12857 LINUX: add afs_d_alias_lock & _unlock compat wrappers
12856 LINUX: create afs_linux_dget() compat wrapper
12855 Revert "LINUX: do not use d_invalidate to evict dentries"
12854 Revert "LINUX: eliminate unused variable warning"

1.6.23pre1
----------

Candidates for 1.6.23pre1

Rx fix: Reviews are required.

12866 rx: rxi_ReceiveDataPacket do not set rprev on drop (7784-v2)
12865 rx: Add a helper function for delayed acks (7784-v2)
12864 rx: rxi_ReceiveDataPacket do not set rprev on drop (7784-v1)

Fixes for ubik quorum loss issues; These require reviews before they will be
considered to be included in 1.6.x. Note there are several ubik patches
pending for the master branch (see master branch below)

12814 ubik: avoid DISK_Begin on sites that didn't vote for sync
12811 ubik: allow remote dbase relabel if up to date
12807 ubik: remove useless signal call
12806 ubik: update epoch as soon as sync-site is elected
12803 ubik: avoid early DISK_Begin calls we know will fail

Other candidates, backported from 1.8.x/master. Reviews for 1.6.x branch are
needed.


12684 libafs: avoid resetting the dynroot volume every 10 minutes
12685 libafs: rename volume accessTime to setupTime
12686 libafs: update the volume setup time when the vldb is rechecked
12687 libafs: vldb cache timeout option (-volume-ttl)

12667 afs: fix afs_xserver deadlock in afsdb refresh
12645 Put jhutz's ubik analysis in doc/txt
12646 doc: Add introduction and credits to ubik.txt
12666 Linux: fix whitespace in osi_sysctl.c
12643 afs: Improve "Corrupt directory" warning

Other candidates may be proposed. Reviews are welcome.

1.8.x series
============

1.8.0pre4 (beta) is available. Test reports would be appreciated.
Some positive responses on irc (#openafs on freenode.net)

Andrew submitted a change for a deadlock in master/1.8 on old
kernels that have the splice alias race. May be needed for 1.8.0.

12868 LINUX: Avoid locking inode in check_dentry_race


master
======

The autoconf refactoring changes were merged on master recently.

Andrew did raised a question of a possible memory leak in the rxgk crypto code
if krb5_init_context could sporadically fail. Ben as come around to the
position of embedding a krb5_context in the rxgk_key structure as the least-bad
option for now.

Ubik fixes. We need to know if these changes are required
for stable after the set of 12803,12806,12807,12811,12814

12716 ubik: update ubik_dbVersion during SDISK_SendFile
12640 ubik: check if epoch is sane before db relabel
--
Michael Meffie <***@sinenomine.net>
Michael Meffie
2018-01-19 17:59:24 UTC
Permalink
OpenAFS release team weekly meeting

Date: Jan 19, 2018
Participants:
* Stephan Wiesand (Release Manager 1.6)
* Ben Kaduk (Release Manager 1.8)
* Michael Meffie
* Mark Vitale

The release team meetings are held on Fridays at 14:00 GMT on jabber
(xmpp) in release-***@conference.openafs.org. Please request login
information if you would like to participate. Logs are available at
https://conference.openafs.org/release-***@conference.openafs.org/

1.6.x stable series
===================

1.6.22.2
--------

Unfortunately, unavoidable delays are due to time taken by
meltdown/spectre stuff. The next planned 1.6.x release will
be 1.6.22.2, expected before linux 4.15.

- macos high sierra support
topic:highsierra

12833 macos: make the OpenAFS client aware of APFS
12832 macos: packaging support for MacOS X 10.13
12831 macos: add support for MacOS 10.13

- linux 4.15 support
topic:Linux-4.15

12834 Linux: use plain page_cache_alloc
12835 Linux 4.15: check for 2nd argument to pagevec_init

- changes to address getcwd() issues on RHEL7.4
topic:RH74_shakeloose_refactor

12859 LINUX: consolidate duplicate code in osi_TryEvictDentries
12860 LINUX: Avoid d_invalidate() during afs_ShakeLooseVCaches()
12858 LINUX: consolidate duplicate code in canonical_dentry
12857 LINUX: add afs_d_alias_lock & _unlock compat wrappers
12856 LINUX: create afs_linux_dget() compat wrapper
12855 Revert "LINUX: do not use d_invalidate to evict dentries"
12854 Revert "LINUX: eliminate unused variable warning"

1.6.23pre1
----------

Candidates for 1.6.23 are gerrits >= 12645 listed under

https://gerrit.openafs.org/#/q/status:open+project:openafs+branch:openafs-stable-1_6_x


Rx fix: Reviews are required.

12866 rx: rxi_ReceiveDataPacket do not set rprev on drop (7784-v2)
12865 rx: Add a helper function for delayed acks (7784-v2)
12864 rx: rxi_ReceiveDataPacket do not set rprev on drop (7784-v1)

Fixes for ubik quorum loss issues.
Andrew Deason has provided comments and +1 in gerrit. More reviews
are needed, and 12716 is a blocker for these.

12814 ubik: avoid DISK_Begin on sites that didn't vote for sync
12811 ubik: allow remote dbase relabel if up to date
12807 ubik: remove useless signal call
12806 ubik: update epoch as soon as sync-site is elected
12803 ubik: avoid early DISK_Begin calls we know will fail


1.8.x series
============

Success report received.

One report on rpm building issues for 1.8.0pre4 for RHEL 6. Fixes to the spec
file have been submitted to gerrit (on master and 1.8.x).

Mike M. to backport the autoconf refactor.

Staged for pre5:
- spec file fixes
- autoconf refactor
- Andrew's deadlock avoidance in check_Dentry_race (12868),
- ubik write fail bug fix (12716)
- freebsd sysname/OS support updates


master
======

Recently merged on the master branch:

12874 redhat: fix conditional for kernel-debuginfo files directive
--
Michael Meffie <***@sinenomine.net>
Loading...