Discussion:
[OpenAFS-devel] Unable to release R/O volume -- 1 Volser: ReadVnodes: IH_CREATE: File exists - restore aborted
Ian Wienand
2018-05-24 10:40:02 UTC
Permalink
Hello,

We lost the backing storage on our R/O server /vicepa sometime
yesterday (it's cloud block storage out of our control, so it
disappeared in a unknown manner). Once things came back, we had
volumes in a range of mostly locked states from updates and "vos
release"s triggered by update cron jobs.

Quite a few I could manually unlock and re-release, and things went
OK. Others have proven more of a problem.

To cut things short, there was a lot of debugging, and we ended up
with stuck transactions between the R/W and R/O server and
un-unlockable volumes. Eventually we rebooted both to clear out
everything. In an attempt to just clear the R/O mirrors and start
again, I did for each problem volume:

vos unlock $MIRROR
vos remove -server afs02.dfw.openstack.org -partition a -id $MIRROR.readonly
vos release -v $MIRROR
vos addsite -server afs02.dfw.openstack.org -partition a -id $MIRROR

My theory being this would completely remove the R/O mirror volume and
start fresh. I then proceeded to do a "vos release" on each volume in
sequence (more details in [1]).

However, this release on the new R/O volume has not worked. Here is
the output from the release of one of the volumes:

---
Thu May 24 09:49:54 UTC 2018
Kerberos initialization for service/***@OPENSTACK.ORG

mirror.ubuntu-ports
RWrite: 536871041 ROnly: 536871042
number of sites -> 3
server afs01.dfw.openstack.org partition /vicepa RW Site
server afs01.dfw.openstack.org partition /vicepa RO Site
server afs02.dfw.openstack.org partition /vicepa RO Site -- Not released
This is a complete release of volume 536871041
There are new RO sites; we will try to only release to new sites
Querying old RO sites for update times... done
RW vol has not changed; only releasing to new RO sites
Starting transaction on cloned volume 536871042... done
Creating new volume 536871042 on replication site afs02.dfw.openstack.org: done
This will be a full dump: read-only volume needs be created for new site
Starting ForwardMulti from 536871042 to 536871042 on afs02.dfw.openstack.org (entire volume).
Release failed: VOLSER: Problems encountered in doing the dump !
The volume 536871041 could not be released to the following 1 sites:
afs02.dfw.openstack.org /vicepa
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed
Thu May 24 09:51:49 UTC 2018
---

It triggers the salvage, on the I presume only partially cloned
volume, which logs

---
05/24/2018 09:51:49 dispatching child to salvage volume 536871041...
05/24/2018 09:51:49 namei_ListAFSSubDirs: warning: VG 536871042 does not have a link table; salvager will recreate it.
05/24/2018 09:51:49 fileserver requested salvage of clone 536871042; scheduling salvage of volume group 536871041...
05/24/2018 09:51:49 VReadVolumeDiskHeader: Couldn't open header for volume 536871041 (errno 2).
05/24/2018 09:51:49 2 nVolumesInInodeFile 64
05/24/2018 09:51:49 CHECKING CLONED VOLUME 536871042.
05/24/2018 09:51:49 mirror.ubuntu-ports.readonly (536871042) updated 05/24/2018 06:08
05/24/2018 09:51:49 totalInodes 32896
---

On the R/O server side (afs02) we have

---
Thu May 24 09:49:55 2018 VReadVolumeDiskHeader: Couldn't open header for volume 536871042 (errno 2).
Thu May 24 09:49:55 2018 attach2: forcing vol 536871042 to error state (state 0 flags 0x0 ec 103)
Thu May 24 09:49:55 2018 1 Volser: CreateVolume: volume 536871042 (mirror.ubuntu-ports.readonly) created
Thu May 24 09:51:49 2018 1 Volser: ReadVnodes: IH_CREATE: File exists - restore aborted
Thu May 24 09:51:49 2018 Scheduling salvage for volume 536871042 on part /vicepa over FSSYNC
---

I do not see anything on the R/W server side (afs01).

I have fsck'd the /vicepa partition on the RO server (afs02) and it is
OK.

I can not find much info on "IH_CREATE: File exists" which I assume is
the problem here. I would welcome any suggestions! Clearly my theory
of "vos remove" and "vos add" of the mirror hasn't cleared out enough
state to recover things?

All servers are Xenial-based with it's current 1.6.7-1ubuntu1.1
openafs packages.

Thanks,

-i

[1] http://lists.openstack.org/pipermail/openstack-infra/2018-May/005949.html
Michael Meffie
2018-05-24 15:58:21 UTC
Permalink
On Thu, 24 May 2018 20:40:02 +1000
Post by Ian Wienand
Hello,
We lost the backing storage on our R/O server /vicepa sometime
yesterday (it's cloud block storage out of our control, so it
disappeared in a unknown manner). Once things came back, we had
volumes in a range of mostly locked states from updates and "vos
release"s triggered by update cron jobs.
Quite a few I could manually unlock and re-release, and things went
OK. Others have proven more of a problem.
To cut things short, there was a lot of debugging, and we ended up
with stuck transactions between the R/W and R/O server and
un-unlockable volumes. Eventually we rebooted both to clear out
everything. In an attempt to just clear the R/O mirrors and start
vos unlock $MIRROR
vos remove -server afs02.dfw.openstack.org -partition a -id $MIRROR.readonly
vos release -v $MIRROR
vos addsite -server afs02.dfw.openstack.org -partition a -id $MIRROR
My theory being this would completely remove the R/O mirror volume and
start fresh. I then proceeded to do a "vos release" on each volume in
sequence (more details in [1]).
However, this release on the new R/O volume has not worked. Here is
---
Thu May 24 09:49:54 UTC 2018
mirror.ubuntu-ports
RWrite: 536871041 ROnly: 536871042
number of sites -> 3
server afs01.dfw.openstack.org partition /vicepa RW Site
server afs01.dfw.openstack.org partition /vicepa RO Site
server afs02.dfw.openstack.org partition /vicepa RO Site -- Not released
This is a complete release of volume 536871041
There are new RO sites; we will try to only release to new sites
Querying old RO sites for update times... done
RW vol has not changed; only releasing to new RO sites
Starting transaction on cloned volume 536871042... done
Creating new volume 536871042 on replication site afs02.dfw.openstack.org: done
This will be a full dump: read-only volume needs be created for new site
Starting ForwardMulti from 536871042 to 536871042 on afs02.dfw.openstack.org (entire volume).
Release failed: VOLSER: Problems encountered in doing the dump !
afs02.dfw.openstack.org /vicepa
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed
Thu May 24 09:51:49 UTC 2018
---
It triggers the salvage, on the I presume only partially cloned
volume, which logs
---
05/24/2018 09:51:49 dispatching child to salvage volume 536871041...
05/24/2018 09:51:49 namei_ListAFSSubDirs: warning: VG 536871042 does not have a link table; salvager will recreate it.
05/24/2018 09:51:49 fileserver requested salvage of clone 536871042; scheduling salvage of volume group 536871041...
05/24/2018 09:51:49 VReadVolumeDiskHeader: Couldn't open header for volume 536871041 (errno 2).
05/24/2018 09:51:49 2 nVolumesInInodeFile 64
05/24/2018 09:51:49 CHECKING CLONED VOLUME 536871042.
05/24/2018 09:51:49 mirror.ubuntu-ports.readonly (536871042) updated 05/24/2018 06:08
05/24/2018 09:51:49 totalInodes 32896
---
On the R/O server side (afs02) we have
---
Thu May 24 09:49:55 2018 VReadVolumeDiskHeader: Couldn't open header for volume 536871042 (errno 2).
Thu May 24 09:49:55 2018 attach2: forcing vol 536871042 to error state (state 0 flags 0x0 ec 103)
Thu May 24 09:49:55 2018 1 Volser: CreateVolume: volume 536871042 (mirror.ubuntu-ports.readonly) created
Thu May 24 09:51:49 2018 1 Volser: ReadVnodes: IH_CREATE: File exists - restore aborted
Thu May 24 09:51:49 2018 Scheduling salvage for volume 536871042 on part /vicepa over FSSYNC
---
I do not see anything on the R/W server side (afs01).
I have fsck'd the /vicepa partition on the RO server (afs02) and it is
OK.
I can not find much info on "IH_CREATE: File exists" which I assume is
the problem here.
Yes, there seems to be files left over. For that parent volume number (536871041)
the left over files would be in the path /vicep*/AFSIDat/=0/=0++U
Post by Ian Wienand
I would welcome any suggestions! Clearly my theory
of "vos remove" and "vos add" of the mirror hasn't cleared out enough
state to recover things?
A full partition salvage on the ro server should remove the orphaned files,

bos salvage -server afs02 -partition a -showlog -orphans attach -forceDAFS
Post by Ian Wienand
All servers are Xenial-based with it's current 1.6.7-1ubuntu1.1
openafs packages.
Thanks,
-i
[1] http://lists.openstack.org/pipermail/openstack-infra/2018-May/005949.html
_______________________________________________
OpenAFS-devel mailing list
https://lists.openafs.org/mailman/listinfo/openafs-devel
--
Michael Meffie <***@sinenomine.net>
Ian Wienand
2018-05-28 00:16:17 UTC
Permalink
Thank you for the response
Post by Michael Meffie
Post by Ian Wienand
I can not find much info on "IH_CREATE: File exists" which I assume is
the problem here.
Yes, there seems to be files left over. For that parent volume number (536871041)
the left over files would be in the path /vicep*/AFSIDat/=0/=0++U
We (with the help of auristor) came to a similar conclusion for
another of our corrupted volumes, with the help of strace(). How did
you calculate that hash for volume number -> AFSIDat/ path?

We ended up rm-ing the directory, and the release of that volume
worked.
Post by Michael Meffie
A full partition salvage on the ro server should remove the orphaned files,
bos salvage -server afs02 -partition a -showlog -orphans attach -forceDAFS
I did run this as suggested, but it didn't seem sufficient to find
these orphaned files.

Here is the salvage log for

# bos salvage -server localhost -localauth -partition a -showlog -orphans attach -forceDAFS -volume mirror.ubuntu-ports.readonly

05/24/2018 23:36:52 dispatching child to salvage volume 536871041...
05/24/2018 23:36:52 VReadVolumeDiskHeader: Couldn't open header for volume 536871041 (errno 2).
05/24/2018 23:36:52 2 nVolumesInInodeFile 64
05/24/2018 23:36:52 CHECKING CLONED VOLUME 536871042.
05/24/2018 23:36:52 mirror.ubuntu-ports.readonly (536871042) updated 05/24/2018 06:08
05/24/2018 23:36:52 totalInodes 32894

It doesn't seem to be mentioning orphaned files ... it looks for
/vicepa/V0536871041.vol which isn't there

16923 open("/vicepa/V0536871041.vol", O_RDONLY) = -1 ENOENT (No such file or directory)
16923 gettimeofday({1527205271, 228504}, NULL) = 0
16923 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
16923 write(7, "05/24/2018 23:41:11 VReadVolumeD"..., 96) = 96

And then, as you say, starts looking at /vicepa/AFSIDat/=0/=0++U

16923 openat(AT_FDCWD, "/vicepa", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 10
16923 getdents(10, /* 29 entries */, 32768) = 1088
16923 getdents(10, /* 0 entries */, 32768) = 0
16923 close(10) = 0
16923 open("/vicepa/salvage.inodes.vicepa.16923", O_RDWR|O_CREAT|O_TRUNC, 0666) = 10
16923 unlink("/vicepa/salvage.inodes.vicepa.16923") = 0
16923 stat("/vicepa", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
16923 openat(AT_FDCWD, "/vicepa/AFSIDat/=0/=0++U/special", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 11
16923 getdents(11, /* 6 entries */, 32768) = 176
16923 stat("/vicepa/AFSIDat/=0/=0++U/special/zzzzD66+++0", {st_mode=S_IFREG|03, st_size=8205056, ...}) = 0
16923 fstat(10, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
16923 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f89ff67b000
16923 stat("/vicepa/AFSIDat/=0/=0++U/special/zzzzP26+++0", {st_mode=S_IFREG|06, st_size=375950, ...}) = 0
16923 open("/vicepa/AFSIDat/=0/=0++U/special/zzzzP26+++0", O_RDWR) = 12
(and so on)
...

Is it "safe" -- in the context of recovery of a RO mirror from an
uncontrolled storage loss which is not responding to salvage attempts
-- to simply remove the AFSIDat/... directory for the volume and let
the "vos release" process re-create everything? Is that private for
that volume; i.e. removing it won't make thing worse for *other*
volumes at least?

-i
Michael Meffie
2018-05-29 15:56:52 UTC
Permalink
On Mon, 28 May 2018 10:16:17 +1000
Post by Ian Wienand
Thank you for the response
Post by Michael Meffie
Post by Ian Wienand
I can not find much info on "IH_CREATE: File exists" which I assume is
the problem here.
Yes, there seems to be files left over. For that parent volume number (536871041)
the left over files would be in the path /vicep*/AFSIDat/=0/=0++U
We (with the help of auristor) came to a similar conclusion for
another of our corrupted volumes, with the help of strace(). How did
you calculate that hash for volume number -> AFSIDat/ path?
The afs-tools repo on github has a small perl script to calculate the namei
paths from the volume numbers, and the volume numbers from the namei path;

https://github.com/openafs-contrib/afs-tools/blob/master/debug/volnamei
Post by Ian Wienand
We ended up rm-ing the directory, and the release of that volume
worked.
Glad you were able to get things back in working order.
Post by Ian Wienand
Post by Michael Meffie
A full partition salvage on the ro server should remove the orphaned files,
bos salvage -server afs02 -partition a -showlog -orphans attach -forceDAFS
I did run this as suggested, but it didn't seem sufficient to find
these orphaned files.
I think I misspoke here; the orphans referred to by the -orphans option means
vnodes which are not referenced by directory entries. In your case, I believe
the volume header was missing (at least).
Post by Ian Wienand
Here is the salvage log for
# bos salvage -server localhost -localauth -partition a -showlog -orphans attach -forceDAFS -volume mirror.ubuntu-ports.readonly
The -volume option was specified here, so the salvager takes a different code
path. Without the -volume option, the salvager is more aggressive and removes
all the old unreferenced files. Testing this in test cell, I see the salvager
removes the unreferenced files and then I'm able to do a vos release to
recreate the read-only volume.

(Also note, if you do perform a per-volume salvage, you want to give the name
or id of the parent volume, that is the read-write volume, to the salvager
since it is dealing with the so called volume group, not a single volume in the
group.)
Post by Ian Wienand
05/24/2018 23:36:52 dispatching child to salvage volume 536871041...
05/24/2018 23:36:52 VReadVolumeDiskHeader: Couldn't open header for volume 536871041 (errno 2).
05/24/2018 23:36:52 2 nVolumesInInodeFile 64
05/24/2018 23:36:52 CHECKING CLONED VOLUME 536871042.
05/24/2018 23:36:52 mirror.ubuntu-ports.readonly (536871042) updated 05/24/2018 06:08
05/24/2018 23:36:52 totalInodes 32894
It doesn't seem to be mentioning orphaned files ... it looks for
/vicepa/V0536871041.vol which isn't there
16923 open("/vicepa/V0536871041.vol", O_RDONLY) = -1 ENOENT (No such file or directory)
16923 gettimeofday({1527205271, 228504}, NULL) = 0
16923 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
16923 write(7, "05/24/2018 23:41:11 VReadVolumeD"..., 96) = 96
And then, as you say, starts looking at /vicepa/AFSIDat/=0/=0++U
16923 openat(AT_FDCWD, "/vicepa", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 10
16923 getdents(10, /* 29 entries */, 32768) = 1088
16923 getdents(10, /* 0 entries */, 32768) = 0
16923 close(10) = 0
16923 open("/vicepa/salvage.inodes.vicepa.16923", O_RDWR|O_CREAT|O_TRUNC, 0666) = 10
16923 unlink("/vicepa/salvage.inodes.vicepa.16923") = 0
16923 stat("/vicepa", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
16923 openat(AT_FDCWD, "/vicepa/AFSIDat/=0/=0++U/special", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 11
16923 getdents(11, /* 6 entries */, 32768) = 176
16923 stat("/vicepa/AFSIDat/=0/=0++U/special/zzzzD66+++0", {st_mode=S_IFREG|03, st_size=8205056, ...}) = 0
16923 fstat(10, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
16923 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f89ff67b000
16923 stat("/vicepa/AFSIDat/=0/=0++U/special/zzzzP26+++0", {st_mode=S_IFREG|06, st_size=375950, ...}) = 0
16923 open("/vicepa/AFSIDat/=0/=0++U/special/zzzzP26+++0", O_RDWR) = 12
(and so on)
Is it "safe" -- in the context of recovery of a RO mirror from an
uncontrolled storage loss which is not responding to salvage attempts
-- to simply remove the AFSIDat/... directory for the volume and let
the "vos release" process re-create everything? Is that private for
that volume; i.e. removing it won't make thing worse for *other*
volumes at least?
I would try a full partition salvage first if possible. Yes, all the data and
meta data for a given volume group is stored in the AFSIDat subdirectories,
hashed by the read-write volume id;

/vicep<XX>/AFSIDat/<hash(rw-id) && 0xff>/<hash(rw-id)>/

plus the volume header in /vicep<XX>/V<id>.vol, which specifies the special files
for the volume group. So removing all the files under /vicep<XX>/AFSIDat/xx/yyyy
(while the server process are off) would not affect other volumes, and should
be ok if you are very careful and do not accidentally remove volume groups with
read-write data.
--
Michael Meffie <***@sinenomine.net>
Loading...