*Re: [RFC 02/32] uapi: add struct __kernel_timespec{32,64}
2014-05-30 20:01 ` [RFC 02/32] uapi: add struct __kernel_timespec{32,64} Arnd Bergmann
@ 2014-05-30 20:18 ` H. Peter Anvin
2014-05-31 15:09 ` Arnd Bergmann0 siblings, 1 reply; 124+ messages in thread
From: H. Peter Anvin @ 2014-05-30 20:18 UTC (permalink / raw)
To: Arnd Bergmann, linux-kernel
Cc: linux-arch, joseph, john.stultz, hch, tglx, geert, lftan, linux-fsdevel
On 05/30/2014 01:01 PM, Arnd Bergmann wrote:
> We cannot use time_t or any derived structures beyond the year
> 2038 in interfaces between kernel and user space, on 32-bit
> machines.
>
> This is my suggestion for how to migrate syscall and ioctl
> interfaces: We completely phase out time_t, timeval and timespec
> from the uapi header files and replace them with types that are
> either explicitly safe (__kernel_timespec64), or explicitly
> unsafe (e.g. __kernel_timespec32). For each unsafe interface,
> there needs to be a safe replacement interface.
>
This gets really messy for structures where this is ABI-dependent. I'm
not sure this is a net win.
> +/*
> + * __kernel_timespec64 is the general type to be used for
> + * new user space interfaces passing a time argument.
> + * 64-bit nanoseconds is a bit silly, but the advantage is
> + * that it is compatible with the native 'struct timespec'
> + * on 64-bit user space. This simplifies the compat code.
> + */
> +struct __kernel_timespec64 {
> + long long tv_sec;
> + long long tv_nsec;
> +};
So it seems that it is not just POSIX that is drain bramaged with this,
but the "long" type for tv_nsec idiocy has made it into the C11
standard. This unfortunately means that now there are two standards
bodies involved, at least one of which moves very slowly.
This makes me wonder if we don't need to deal with the problem in the
case of 32-bit ABIs with 64-bit time_t. The logical thing seems to be
to EITHER:
a. ALWAYS ignore the upper 32 bits of tv_nsec when read from user space,
but always set them to zero, or
b. Only ignore the upper 32 bits of tv_nsec when we are known to come
from a 32-bit ABI context, but still always return zero. These bits
are already only used for validity checking.
This most likely introduces a whole lot of new tests in deep paths,
although we probably can centralize this in a single function, which
otherwise ends up looking a lot like compat_get_timespec().
Getting rid of struct timespec on the kernel/user boundary is probably
not really feasible.
-hpa
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 0:37 ` Dave Chinner@ 2014-05-31 0:41 ` H. Peter Anvin
2014-05-31 1:14 ` Dave Chinner0 siblings, 1 reply; 124+ messages in thread
From: H. Peter Anvin @ 2014-05-31 0:41 UTC (permalink / raw)
To: Dave Chinner, Arnd Bergmann
Cc: linux-kernel, linux-arch, joseph, john.stultz, hch, tglx, geert,
lftan, linux-fsdevel, xfs
On 05/30/2014 05:37 PM, Dave Chinner wrote:
>
> IOWs, the filesystem has to be able to reject any attempt to set a
> timestamp that is can't represent on disk otherwise Bad Stuff will
> happen,
Actually it is questionable if it is worse to reject a timestamp or just
let it wrap. Rejecting a valid timestamp is a bit like "You don't
exist, go away."
> and filesystems have to be able to specify in their on
> disk format what timestamp encoding is being used. The solution will
> be different for every filesystem that needs to support time beyond
> 2038.
Actually the cutoff can be really different for each filesystem, not
necessarily 2038. However, I maintain the above still holds.
Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What
would you have expected such a filesystem to do on Jan 1, 2000?
-hpa
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 0:41 ` H. Peter Anvin@ 2014-05-31 1:14 ` Dave Chinner
2014-05-31 1:22 ` H. Peter Anvin
2014-05-31 15:37 ` Arnd Bergmann0 siblings, 2 replies; 124+ messages in thread
From: Dave Chinner @ 2014-05-31 1:14 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Arnd Bergmann, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> On 05/30/2014 05:37 PM, Dave Chinner wrote:
> >
> > IOWs, the filesystem has to be able to reject any attempt to set a
> > timestamp that is can't represent on disk otherwise Bad Stuff will
> > happen,
>
> Actually it is questionable if it is worse to reject a timestamp or just
> let it wrap. Rejecting a valid timestamp is a bit like "You don't
> exist, go away."
I think having the new systems calls being able to
return EINVAL if the value cannot be stored permanently on disk
correctly is the right thing to do. Having it silently mangled
by the filesystem and returning "everything is just fine, trust me"
is close to the worst solution I can think of. That's exactly what
leads to overflow bugs occurring....
> > and filesystems have to be able to specify in their on
> > disk format what timestamp encoding is being used. The solution will
> > be different for every filesystem that needs to support time beyond
> > 2038.
>
> Actually the cutoff can be really different for each filesystem, not
> necessarily 2038. However, I maintain the above still holds.
Sure, but all filesystems are supposed to handle at least the
current unix epoch.
> Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What
> would you have expected such a filesystem to do on Jan 1, 2000?
Strawman.
We don't need to cater for fundamentally broken designs that can't
even handle the current unix epoch correctly. If such filesystems
exist, then they can simple say "original unix epoch support only"
and do whatever crap they are doing right now.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 1:14 ` Dave Chinner@ 2014-05-31 1:22 ` H. Peter Anvin
2014-05-31 5:54 ` Dave Chinner
2014-05-31 15:37 ` Arnd Bergmann1 sibling, 1 reply; 124+ messages in thread
From: H. Peter Anvin @ 2014-05-31 1:22 UTC (permalink / raw)
To: Dave Chinner
Cc: Arnd Bergmann, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
No, not a strawman. Replace with Jan 26, 2038 and you have the same situation.
On May 30, 2014 6:14:50 PM PDT, Dave Chinner <david@fromorbit.com> wrote:
>On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
>> On 05/30/2014 05:37 PM, Dave Chinner wrote:
>> >
>> > IOWs, the filesystem has to be able to reject any attempt to set a
>> > timestamp that is can't represent on disk otherwise Bad Stuff will
>> > happen,
>>
>> Actually it is questionable if it is worse to reject a timestamp or
>just
>> let it wrap. Rejecting a valid timestamp is a bit like "You don't
>> exist, go away."
>
>I think having the new systems calls being able to
>return EINVAL if the value cannot be stored permanently on disk
>correctly is the right thing to do. Having it silently mangled
>by the filesystem and returning "everything is just fine, trust me"
>is close to the worst solution I can think of. That's exactly what
>leads to overflow bugs occurring....
>
>> > and filesystems have to be able to specify in their on
>> > disk format what timestamp encoding is being used. The solution
>will
>> > be different for every filesystem that needs to support time beyond
>> > 2038.
>>
>> Actually the cutoff can be really different for each filesystem, not
>> necessarily 2038. However, I maintain the above still holds.
>
>Sure, but all filesystems are supposed to handle at least the
>current unix epoch.
>
>> Consider a filesystem that kept timestamps in YYMMDDHHMMSS format.
>What
>> would you have expected such a filesystem to do on Jan 1, 2000?
>
>Strawman.
>
>We don't need to cater for fundamentally broken designs that can't
>even handle the current unix epoch correctly. If such filesystems
>exist, then they can simple say "original unix epoch support only"
>and do whatever crap they are doing right now.
>
>Cheers,
>
>Dave.
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 1:22 ` H. Peter Anvin@ 2014-05-31 5:54 ` Dave Chinner
2014-05-31 8:41 ` H. Peter Anvin
2014-06-02 14:00 ` Joseph S. Myers0 siblings, 2 replies; 124+ messages in thread
From: Dave Chinner @ 2014-05-31 5:54 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Arnd Bergmann, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
[ Please don't top post. ]
On Fri, May 30, 2014 at 06:22:55PM -0700, H. Peter Anvin wrote:
> On May 30, 2014 6:14:50 PM PDT, Dave Chinner <david@fromorbit.com> wrote:
> >On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> >> On 05/30/2014 05:37 PM, Dave Chinner wrote:
> >> >
> >> > IOWs, the filesystem has to be able to reject any attempt to
> >> > set a timestamp that is can't represent on disk otherwise Bad
> >> > Stuff will happen,
> >>
> >> Actually it is questionable if it is worse to reject a
> >> timestamp or
> >just
> >> let it wrap. Rejecting a valid timestamp is a bit like "You
> >> don't exist, go away."
> >
> >I think having the new systems calls being able to return EINVAL
> >if the value cannot be stored permanently on disk correctly is
> >the right thing to do. Having it silently mangled by the
> >filesystem and returning "everything is just fine, trust me" is
> >close to the worst solution I can think of. That's exactly what
> >leads to overflow bugs occurring....
> >
> >> > and filesystems have to be able to specify in their on disk
> >> > format what timestamp encoding is being used. The solution
> >will
> >> > be different for every filesystem that needs to support time
> >> > beyond 2038.
> >>
> >> Actually the cutoff can be really different for each
> >> filesystem, not necessarily 2038. However, I maintain the
> >> above still holds.
> >
> >Sure, but all filesystems are supposed to handle at least the
> >current unix epoch.
> >
> >> Consider a filesystem that kept timestamps in YYMMDDHHMMSS
> >> format.
> >What
> >> would you have expected such a filesystem to do on Jan 1, 2000?
> >
> >Strawman.
> >
> >We don't need to cater for fundamentally broken designs that
> >can't even handle the current unix epoch correctly. If such
> >filesystems exist, then they can simple say "original unix epoch
> >support only" and do whatever crap they are doing right now.
>
> No, not a strawman. Replace with Jan 26, 2038 and you have the
> same situation.
But that's not the problem I'm talking about. The problem isn't the
roll-over date of the epoch - the problem is that we're changing the
in-memory meaning of time without changing what the filesystems
store on disk or how they translate them.
To use your example, what I'm actually talking about is the kernel
switching to CCYYMMDDHHMMSS while the filesystem has YYMMDDHHMMSS on
disk. The filesystem doesn't know the timestamp is now a different
format, so it could mangle it writing it to disk, or it could mangle
existing timestamps in the YY.. format reading them from disk and
putting them into CC.. format structures. IOWs, it will
incorrectly translate YY format dates to CC format, or translate
something in the CC format as though it was in YY format. And it
wouldn't even know what was the correct format because there's
nothing telling it on disk whether the date is in CC or YY format.
Either way, you get mangled timestamps, the filesystem doesn't know
about it because it's just storing what the kernel gives it, the
kernel thinks they are fine because they are just opaque when read
back, but the user says "what the fuck did a reboot do to all these
timestamps?".
Hence your example of roll-over dates is a strawman - you've
constructed a problem that is irrelevant to the issue being pointed
out.
FWIW, we already have code in the superblock and VFS to avoid such
problems on filesystems with limited timestamp resolution (i.e
s_time_gran and current_fs_time()) so that what the VFS hands the
filesystem is exactly what the VFS expects to get back from disk
when comparing timestamps.
If we are changing the in-kernel timestamp to have a greater dynamic
range that anything we current support on disk, then we need support
for all filesystems for similar translation and constraint. The
filesystems need to be able to tell the kernel what they timestamp
range they support, and then the kernel needs to follow those
guidelines. And if the filesystem is mounted on a kernel that
doesn't support the current filesystem's timestamp format, then at
minimum that filesystem cannot do anything that writes a
timestamp....
Put simply: the filesystem defines the timestamp range that can be
used safely, not the userspace API. If the filesystem can't support
the date it is handed then that is an out-of-range error. Since
when have we accepted that it's OK to handle out-of-range data with
silent overflows or corruption of the data that we are attempting to
store? We're defining a new API to support a wider date range -
there is nothing that prevents us from saying ERANGE can be returned
to a timestamp that the file cannot store correctly....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 5:54 ` Dave Chinner@ 2014-05-31 8:41 ` H. Peter Anvin
2014-05-31 15:46 ` Nicolas Pitre
2014-06-01 0:39 ` Dave Chinner
2014-06-02 14:00 ` Joseph S. Myers1 sibling, 2 replies; 124+ messages in thread
From: H. Peter Anvin @ 2014-05-31 8:41 UTC (permalink / raw)
To: Dave Chinner
Cc: Arnd Bergmann, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On 05/30/2014 10:54 PM, Dave Chinner wrote:
>
> If we are changing the in-kernel timestamp to have a greater dynamic
> range that anything we current support on disk, then we need support
> for all filesystems for similar translation and constraint. The
> filesystems need to be able to tell the kernel what they timestamp
> range they support, and then the kernel needs to follow those
> guidelines. And if the filesystem is mounted on a kernel that
> doesn't support the current filesystem's timestamp format, then at
> minimum that filesystem cannot do anything that writes a
> timestamp....
>
> Put simply: the filesystem defines the timestamp range that can be
> used safely, not the userspace API. If the filesystem can't support
> the date it is handed then that is an out-of-range error. Since
> when have we accepted that it's OK to handle out-of-range data with
> silent overflows or corruption of the data that we are attempting to
> store? We're defining a new API to support a wider date range -
> there is nothing that prevents us from saying ERANGE can be returned
> to a timestamp that the file cannot store correctly....
>
I'm still puzzled.
Are you saying that you want a program that does:
/* Deliberately simplified */
gettimeofdayns(&now ...);
utimensat(... now);
... to suddenly start failing on Jan 19, 2038 (for a filesystem with
32-bit timestamps), or would you propose some ways for the filesystems
in question to extend the range of the timestamps?
What you seem to propose also seems to imply that on Jan 19, 2038
anything that writes a timestamp with the current date (which logically
ends up being almost every write operation) would be dead and frozen on
such a filesystem -- pretty much meaning the filesystem would become
readonly if not in reality than in practice.
I strongly suspect that that would be a more catastrophic failure than
incorrect timestamps, as you suddenly have all kinds of machines
embedded in $DEITY knows what places just stop and refuse to run.
If that is not what you mean I genuinely like to understand the
situation better.
-hpa
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 13/32] ext3: convert to struct inode_time
2014-05-31 9:10 ` H. Peter Anvin@ 2014-05-31 14:32 ` Arnd Bergmann0 siblings, 0 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-05-31 14:32 UTC (permalink / raw)
To: H. Peter Anvin
Cc: linux-kernel, linux-arch, joseph, john.stultz, hch, tglx, geert,
lftan, linux-fsdevel, Jan Kara, Andrew Morton, Andreas Dilger,
linux-ext4
On Saturday 31 May 2014 02:10:45 H. Peter Anvin wrote:
> On 05/30/2014 01:01 PM, Arnd Bergmann wrote:
> > ext3fs uses unsigned 32-bit seconds for inode timestamps, which will work
> > for the next 92 years, but the VFS uses struct timespec for timestamps,
> > which is only good until 2038 on 32-bit CPUs.
> >
> > This gets us one small step closer to lifting the VFS limit by using
> > struct inode_time in ext3. The on-disk format limit is lifted in ext4,
> > which will work until 2514.
> >
>
> This may be what the spec says, but when I experimented with this just
> now it does seem that both ext2 and ext3 actually interpret timestamps
> as *signed* 32-bit seconds.
Right, I can see that in ext3_iget() now:
inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);
I may have just looked at ext3_do_update_inode(), which uses this
unsigned conversion:
raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
and didn't realize that this is only half of the story, and since it
converts from (potentially 64-bit) long to u32, it doesn't matter
whether that is signed or unsigned.
I may have to go through all of them again to see if I made the same
mistake in other file systems as well.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 02/32] uapi: add struct __kernel_timespec{32,64}
2014-05-30 20:18 ` H. Peter Anvin@ 2014-05-31 15:09 ` Arnd Bergmann0 siblings, 0 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-05-31 15:09 UTC (permalink / raw)
To: H. Peter Anvin
Cc: linux-kernel, linux-arch, joseph, john.stultz, hch, tglx, geert,
lftan, linux-fsdevel
On Friday 30 May 2014 13:18:45 H. Peter Anvin wrote:
> On 05/30/2014 01:01 PM, Arnd Bergmann wrote:
> > We cannot use time_t or any derived structures beyond the year
> > 2038 in interfaces between kernel and user space, on 32-bit
> > machines.
> >
> > This is my suggestion for how to migrate syscall and ioctl
> > interfaces: We completely phase out time_t, timeval and timespec
> > from the uapi header files and replace them with types that are
> > either explicitly safe (__kernel_timespec64), or explicitly
> > unsafe (e.g. __kernel_timespec32). For each unsafe interface,
> > there needs to be a safe replacement interface.
> >
>
> This gets really messy for structures where this is ABI-dependent. I'm
> not sure this is a net win.
We could have an extra '__kernel_oldtimespec' type that we can
use for all ABIs that are today defined in terms of timespec.
What I was mostly trying to avoid here is leaving any 'struct timespec'
in header files, because glibc may define that type differently
depending on a __TIME_BITS macro. This is more of a problem for
ioctls than for system calls.
> > +/*
> > + * __kernel_timespec64 is the general type to be used for
> > + * new user space interfaces passing a time argument.
> > + * 64-bit nanoseconds is a bit silly, but the advantage is
> > + * that it is compatible with the native 'struct timespec'
> > + * on 64-bit user space. This simplifies the compat code.
> > + */
> > +struct __kernel_timespec64 {
> > + long long tv_sec;
> > + long long tv_nsec;
> > +};
>
> So it seems that it is not just POSIX that is drain bramaged with this,
> but the "long" type for tv_nsec idiocy has made it into the C11
> standard. This unfortunately means that now there are two standards
> bodies involved, at least one of which moves very slowly.
My feeling is that our best hope is to completely isolate the kernel
interfaces from what user space wants to have as time_t. glibc for
instance may have a different idea about standards compliance than
android or klibc.
> This makes me wonder if we don't need to deal with the problem in the
> case of 32-bit ABIs with 64-bit time_t. The logical thing seems to be
> to EITHER:
>
> a. ALWAYS ignore the upper 32 bits of tv_nsec when read from user space,
> but always set them to zero, or
> b. Only ignore the upper 32 bits of tv_nsec when we are known to come
> from a 32-bit ABI context, but still always return zero. These bits
> are already only used for validity checking.
>
> This most likely introduces a whole lot of new tests in deep paths,
> although we probably can centralize this in a single function, which
> otherwise ends up looking a lot like compat_get_timespec().
>
> Getting rid of struct timespec on the kernel/user boundary is probably
> not really feasible.
My approach was based on the discussion with Joseph, who would like glibc
to support both 32 and 64 bit time_t using the same libc binary and
versioned symbols. I don't see how that could work when you build a
user space program that sees a timespec in kernel headers and tries
to pass that into a non-translated kernel interface (e.g. ioctl) but
use the same timespec for a glibc-provided function like gettimeofday().
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 1:14 ` Dave Chinner
2014-05-31 1:22 ` H. Peter Anvin@ 2014-05-31 15:37 ` Arnd Bergmann
2014-06-01 0:24 ` Dave Chinner1 sibling, 1 reply; 124+ messages in thread
From: Arnd Bergmann @ 2014-05-31 15:37 UTC (permalink / raw)
To: Dave Chinner
Cc: H. Peter Anvin, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On Saturday 31 May 2014 11:14:50 Dave Chinner wrote:
> On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> > On 05/30/2014 05:37 PM, Dave Chinner wrote:
> > >
> > > IOWs, the filesystem has to be able to reject any attempt to set a
> > > timestamp that is can't represent on disk otherwise Bad Stuff will
> > > happen,
> >
> > Actually it is questionable if it is worse to reject a timestamp or just
> > let it wrap. Rejecting a valid timestamp is a bit like "You don't
> > exist, go away."
>
> I think having the new systems calls being able to
> return EINVAL if the value cannot be stored permanently on disk
> correctly is the right thing to do. Having it silently mangled
> by the filesystem and returning "everything is just fine, trust me"
> is close to the worst solution I can think of. That's exactly what
> leads to overflow bugs occurring....
While going through the file systems, I was wondering whether
we should have the times stop at the end of each file systems
epoch rather than wrap around.
> > > and filesystems have to be able to specify in their on
> > > disk format what timestamp encoding is being used. The solution will
> > > be different for every filesystem that needs to support time beyond
> > > 2038.
> >
> > Actually the cutoff can be really different for each filesystem, not
> > necessarily 2038. However, I maintain the above still holds.
>
> Sure, but all filesystems are supposed to handle at least the
> current unix epoch.
In my list at http://kernelnewbies.org/y2038, I found that almost
all file systems at least times until 2106, because they treat
the on-disk value as unsigned on 64-bit systems, or they use
a completely different representation. My guess is that somebody
earlier spent a lot of work on making that happen.
The exceptions are:
* exofs uses signed values, which can probably be changed to be
consistent with the others.
* isofs has a bug that limits it until 2027 on architectures with
a signed 'char' type (otherwise it's 2155).
* udf can represent times for many thousands of years through a
16-bit year representation, but the code to convert to epoch
uses a const array that ends at 2038.
* afs uses signed seconds and can probably be fixed
* coda relies on user space time representation getting passed
through an ioctl.
* I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
where they really use signed.
I was confused about XFS since I didn't noticed that there are
separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
XFS to also use the 1970-2106 time range on 64-bit systems today.
If we are using the variant of my patch that extends
indode_time->tv_sec to s64, nothing should change for XFS
at all, the main difference is that we if it gets extended
to wider on-disk timestamps, they will work the same way on
32-bit and 64-bit kernels.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 8:41 ` H. Peter Anvin@ 2014-05-31 15:46 ` Nicolas Pitre
2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 0:39 ` Dave Chinner1 sibling, 1 reply; 124+ messages in thread
From: Nicolas Pitre @ 2014-05-31 15:46 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Dave Chinner, Arnd Bergmann, linux-kernel, linux-arch, joseph,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, xfs
On Sat, 31 May 2014, H. Peter Anvin wrote:
> On 05/30/2014 10:54 PM, Dave Chinner wrote:
> >
> > If we are changing the in-kernel timestamp to have a greater dynamic
> > range that anything we current support on disk, then we need support
> > for all filesystems for similar translation and constraint. The
> > filesystems need to be able to tell the kernel what they timestamp
> > range they support, and then the kernel needs to follow those
> > guidelines. And if the filesystem is mounted on a kernel that
> > doesn't support the current filesystem's timestamp format, then at
> > minimum that filesystem cannot do anything that writes a
> > timestamp....
> >
> > Put simply: the filesystem defines the timestamp range that can be
> > used safely, not the userspace API. If the filesystem can't support
> > the date it is handed then that is an out-of-range error. Since
> > when have we accepted that it's OK to handle out-of-range data with
> > silent overflows or corruption of the data that we are attempting to
> > store? We're defining a new API to support a wider date range -
> > there is nothing that prevents us from saying ERANGE can be returned
> > to a timestamp that the file cannot store correctly....
> >
>
> I'm still puzzled.
>
> Are you saying that you want a program that does:
>
> /* Deliberately simplified */
> gettimeofdayns(&now ...);
> utimensat(... now);
>
> ... to suddenly start failing on Jan 19, 2038 (for a filesystem with
> 32-bit timestamps), or would you propose some ways for the filesystems
> in question to extend the range of the timestamps?
>
> What you seem to propose also seems to imply that on Jan 19, 2038
> anything that writes a timestamp with the current date (which logically
> ends up being almost every write operation) would be dead and frozen on
> such a filesystem -- pretty much meaning the filesystem would become
> readonly if not in reality than in practice.
For those (legacy) filesystems with a signed 32-bit timestamps, any
attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
(silently) clamped to 0x7fffffff and that value (the last representable
time) used as an overflow indicator. The filesystem driver should
convert that value into a corresponding overflow value for whatever
kernel internal time representation being used when read back, and this
should be propagated up to user space. It should not be a hard error
otherwise, as you rightfully stated, everything non read-only would come
to a halt on that day.
Inside the kernel, the overflow indicator could be as simple as
dedicating one of the top bit in a 64-bit time_t value in order to still
transmit the overflow limit. For example, in the above case, we could
use 0x40000000-7fffffff to indicate the actual time is unavailable due
to the filesystem's time representation being overflowed from
0x7fffffff.
If for example a filesystem cannot represent timestamps from Jan 1
00:00:00 2100 UTC then the overflow representation for this particular
filesystem would be 0x40000000-f48656ff.
Those syscalls with a 32-bit time_t would be returned 0x7fffffff
whenever there is an overflow being signaled. Whether 64-bit
overflow-marked time_t values, when passed to user space, should clear
the overflow bit, or use a unique time_t overflow value, could be
decided and even changed later after discussion with glibc people for
example.
Hard errors should be signaled to user space, and the actual operation
aborted, only with the presence of a new flag passed to the kernel.
However, by default, things should "just work" albeit with the "wrong"
i.e clamped time being saved on disk as much as possible otherwise.
Nicolas
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 15:37 ` Arnd Bergmann@ 2014-06-01 0:24 ` Dave Chinner
2014-06-02 0:28 ` Dave Chinner0 siblings, 1 reply; 124+ messages in thread
From: Dave Chinner @ 2014-06-01 0:24 UTC (permalink / raw)
To: Arnd Bergmann
Cc: H. Peter Anvin, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> On Saturday 31 May 2014 11:14:50 Dave Chinner wrote:
> > On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> > > On 05/30/2014 05:37 PM, Dave Chinner wrote:
> > > >
> > > > IOWs, the filesystem has to be able to reject any attempt to set a
> > > > timestamp that is can't represent on disk otherwise Bad Stuff will
> > > > happen,
> > >
> > > Actually it is questionable if it is worse to reject a timestamp or just
> > > let it wrap. Rejecting a valid timestamp is a bit like "You don't
> > > exist, go away."
> >
> > I think having the new systems calls being able to
> > return EINVAL if the value cannot be stored permanently on disk
> > correctly is the right thing to do. Having it silently mangled
> > by the filesystem and returning "everything is just fine, trust me"
> > is close to the worst solution I can think of. That's exactly what
> > leads to overflow bugs occurring....
>
> While going through the file systems, I was wondering whether
> we should have the times stop at the end of each file systems
> epoch rather than wrap around.
>
> > > > and filesystems have to be able to specify in their on
> > > > disk format what timestamp encoding is being used. The solution will
> > > > be different for every filesystem that needs to support time beyond
> > > > 2038.
> > >
> > > Actually the cutoff can be really different for each filesystem, not
> > > necessarily 2038. However, I maintain the above still holds.
> >
> > Sure, but all filesystems are supposed to handle at least the
> > current unix epoch.
>
> In my list at http://kernelnewbies.org/y2038, I found that almost
> all file systems at least times until 2106, because they treat
> the on-disk value as unsigned on 64-bit systems, or they use
> a completely different representation. My guess is that somebody
> earlier spent a lot of work on making that happen.
>
> The exceptions are:
>
> * exofs uses signed values, which can probably be changed to be
> consistent with the others.
> * isofs has a bug that limits it until 2027 on architectures with
> a signed 'char' type (otherwise it's 2155).
> * udf can represent times for many thousands of years through a
> 16-bit year representation, but the code to convert to epoch
> uses a const array that ends at 2038.
> * afs uses signed seconds and can probably be fixed
> * coda relies on user space time representation getting passed
> through an ioctl.
> * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> where they really use signed.
>
> I was confused about XFS since I didn't noticed that there are
> separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> XFS to also use the 1970-2106 time range on 64-bit systems today.
You've missed an awful lot more than just the implications for the
core kernel code.
There's a good chance such changes propagate to APIs elsewhere in
the filesystems, because something you haven't realised is that XFS
effectively exposes the on-disk timestamp format directly to
userspace via the bulkstat interface (see struct xfs_bstat). It also
affects the XFS open-by-handle ioctl and the swap extent ioctl used
by the online defragmenter.
IOWs, if we are changing the on-disk timestamp format then this
affects several ioctl()s and hence quite a few of the XFS userspace
utilities. The hardest to fix will be xfsdump which would need a new
dump format to store the extended timestamp ranges, and then
xfs_restore will need to be able to handle restoring such timestamps
on filesystems that don't have extended timestamp support...
Put simply, changing the structure of system time isn't as straight
forward as changing the kernel structures. System time gets stored
permanently, and that has a cascade effect through the kernel all
to all of the filesystem utilities that know about that permanent
storage in some way....
So yes, you can change the kernel definition, but until the
permanent storage of system time can be extended to support the same
range as the kernel the *system* will still have nasty, silent epoch
overflow, truncation or corruption issues.
> If we are using the variant of my patch that extends
> indode_time->tv_sec to s64, nothing should change for XFS
> at all, the main difference is that we if it gets extended
> to wider on-disk timestamps, they will work the same way on
> 32-bit and 64-bit kernels.
"nothing should change" except for the fact that a 64 bit timestamp
gets silently truncated to 32 bits and the timestamp is not what the
user expects it to be. The user does not find out until the inode
passes out of cache and is re-read from disk, and then it's wrong.
To put it politely: that is broken, obnoxious behaviour and we don't
design new interfaces with such ugly warts anymore. Define an
EOVERFLOW, EINVAL or ERANGE error in the new syscalls to handle this
case and *hard fail* if the storage cannot support the extended
timestamp being passed in. There is no excuse for silently mangling
out-of-range data, especially as we have plenty of time to add
support to the filesystems so that such errors don't occur. It might
take us a year to implement, but it will be done long before the
epoch overflows.
And, FWIW, this patchset needs a set of regression tests that ensure
timestamps beyond 2038 and 2106 don't change across unmount/mount.
Written for xfstests, preferably, so that it's run as part of every
filesystem developer's daily workflow. This is the only way we are
going to ensure that the filesystem and VFS code works correctly and
continues to work correctly up to the end of the current epoch....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 8:41 ` H. Peter Anvin
2014-05-31 15:46 ` Nicolas Pitre@ 2014-06-01 0:39 ` Dave Chinner1 sibling, 0 replies; 124+ messages in thread
From: Dave Chinner @ 2014-06-01 0:39 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Arnd Bergmann, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On Sat, May 31, 2014 at 01:41:56AM -0700, H. Peter Anvin wrote:
> On 05/30/2014 10:54 PM, Dave Chinner wrote:
> >
> > If we are changing the in-kernel timestamp to have a greater dynamic
> > range that anything we current support on disk, then we need support
> > for all filesystems for similar translation and constraint. The
> > filesystems need to be able to tell the kernel what they timestamp
> > range they support, and then the kernel needs to follow those
> > guidelines. And if the filesystem is mounted on a kernel that
> > doesn't support the current filesystem's timestamp format, then at
> > minimum that filesystem cannot do anything that writes a
> > timestamp....
> >
> > Put simply: the filesystem defines the timestamp range that can be
> > used safely, not the userspace API. If the filesystem can't support
> > the date it is handed then that is an out-of-range error. Since
> > when have we accepted that it's OK to handle out-of-range data with
> > silent overflows or corruption of the data that we are attempting to
> > store? We're defining a new API to support a wider date range -
> > there is nothing that prevents us from saying ERANGE can be returned
> > to a timestamp that the file cannot store correctly....
> >
>
> I'm still puzzled.
>
> Are you saying that you want a program that does:
>
> /* Deliberately simplified */
> gettimeofdayns(&now ...);
> utimensat(... now);
>
> ... to suddenly start failing on Jan 19, 2038 (for a filesystem with
> 32-bit timestamps),
Yes. Hard fail so overflows are in your face and we know exactly
what is going to cause silent timestamp screwups when the epoch
> or would you propose some ways for the filesystems
> in question to extend the range of the timestamps?
Filesystems are going to have to change their on-disk formats, so
we'd do that just like we do every other on-disk format change. With
feature bits and translation layers, new ioctl structures, etc.
Depending on the amount of work necessary, some filesystems could do
this in 3.16, others it might be 3.20 before everything is sorted
out across the kernel and userspace code...
Either way, the hard fail problem goes away as each filesystem is
converted. Further, if we have regression tests then new filesystems
are guaranteed to be designed to handle 2038 epoch rollover, and so
in a year of two this "hard fail" is effectively a non-problem. If
someone breaks something in future, then we'll know about it pretty
quickly.
> What you seem to propose also seems to imply that on Jan 19, 2038
> anything that writes a timestamp with the current date (which logically
> ends up being almost every write operation) would be dead and frozen on
> such a filesystem -- pretty much meaning the filesystem would become
> readonly if not in reality than in practice.
Yup. If we can't do what the user wants without the user thinking
corruption has occurred, then the only thing we are left with is
"shut down the filesystem" error handling. Kind of like using BUG()
rather than returning an error. That's why we need to be able to
hard fail and return an error.
However, we've got 20+ years to fix our current filesystems and all
their support code to ensure this doesn't happen. In the mean time,
having stuff hard fail is a great way to ensure that filesystems get
fixed sooner rather than later...
> I strongly suspect that that would be a more catastrophic failure than
> incorrect timestamps, as you suddenly have all kinds of machines
> embedded in $DEITY knows what places just stop and refuse to run.
Yup, that's a great way of flushing out problems 20 years before
they really matter.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 15:46 ` Nicolas Pitre@ 2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 20:26 ` H. Peter Anvin
2014-06-02 1:36 ` Nicolas Pitre0 siblings, 2 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-01 19:56 UTC (permalink / raw)
To: Nicolas Pitre
Cc: H. Peter Anvin, Dave Chinner, linux-kernel, linux-arch, joseph,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, xfs
On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > readonly if not in reality than in practice.
>
> For those (legacy) filesystems with a signed 32-bit timestamps, any
> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
> (silently) clamped to 0x7fffffff and that value (the last representable
> time) used as an overflow indicator. The filesystem driver should
> convert that value into a corresponding overflow value for whatever
> kernel internal time representation being used when read back, and this
> should be propagated up to user space. It should not be a hard error
> otherwise, as you rightfully stated, everything non read-only would come
> to a halt on that day.
I don't think there is much of a difference between not being able to
write at all and all newly written files having the same timestamp,
causing random things to break differently.
The clamp to the maximum supported time stamp sounds like a reasonable
choice for 'utimens' and related syscalls for the case of someone
setting an arbitrary future date beyond what the file system can
represent. Then again, I don't see a reason why that shouldn't just
cause an error to be returned.
For actually running kernels beyond 2038, the best idea I've seen so
far is to disallow all broken code at compile time. I don't see
a choice but to audit the entire kernel for invalid uses on both
32 and 64 bit in the next few years. A lot of code will get changed
in the process so we can actually keep running 32-bit kernels and
file systems, but other code will likely go away:
* any system calls that pass a time_t, timeval or timespec on
32-bit systems return -ENOSYS, to ensure all user land uses
the replacements we will put into place
* The definition of 'time_t', 'timval' and 'timespec' can be hidden
from the kernel, and all code using it left out.
* ext2 and ext3 file system code will have to be disabled, but that's
file since ext4 can mount old file systems.
* until xfs gets extended, we can also disiable it at build time.
For most users, we probably want to leave all that enabled by
default until we get much closer to 2038, but a compile time
option should allow us to test what works or doesn't, and it
can be set by embedded developers that want to ensure their
code keeps running for the next few decades.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-01 19:56 ` Arnd Bergmann@ 2014-06-01 20:26 ` H. Peter Anvin
2014-06-02 11:02 ` Arnd Bergmann
2014-06-02 1:36 ` Nicolas Pitre1 sibling, 1 reply; 124+ messages in thread
From: H. Peter Anvin @ 2014-06-01 20:26 UTC (permalink / raw)
To: Arnd Bergmann, Nicolas Pitre
Cc: Dave Chinner, linux-kernel, linux-arch, joseph, john.stultz, hch,
tglx, geert, lftan, linux-fsdevel, xfs
Perhaps we should make this a kernel command line option instead, with the settings: error out on outside the standard window, or a date indicating the earliest date that should be recognized and do windowing (0 for no windowing, 1970 for retconning the Unix epoch as unsigned...)
But again, the kernel is probably the least problem here...
On June 1, 2014 12:56:52 PM PDT, Arnd Bergmann <arnd@arndb.de> wrote:
>On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
>> > readonly if not in reality than in practice.
>>
>> For those (legacy) filesystems with a signed 32-bit timestamps, any
>> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
>
>> (silently) clamped to 0x7fffffff and that value (the last
>representable
>> time) used as an overflow indicator. The filesystem driver should
>> convert that value into a corresponding overflow value for whatever
>> kernel internal time representation being used when read back, and
>this
>> should be propagated up to user space. It should not be a hard error
>
>> otherwise, as you rightfully stated, everything non read-only would
>come
>> to a halt on that day.
>
>I don't think there is much of a difference between not being able to
>write at all and all newly written files having the same timestamp,
>causing random things to break differently.
>
>The clamp to the maximum supported time stamp sounds like a reasonable
>choice for 'utimens' and related syscalls for the case of someone
>setting an arbitrary future date beyond what the file system can
>represent. Then again, I don't see a reason why that shouldn't just
>cause an error to be returned.
>
>For actually running kernels beyond 2038, the best idea I've seen so
>far is to disallow all broken code at compile time. I don't see
>a choice but to audit the entire kernel for invalid uses on both
>32 and 64 bit in the next few years. A lot of code will get changed
>in the process so we can actually keep running 32-bit kernels and
>file systems, but other code will likely go away:
>
>* any system calls that pass a time_t, timeval or timespec on
> 32-bit systems return -ENOSYS, to ensure all user land uses
> the replacements we will put into place
>* The definition of 'time_t', 'timval' and 'timespec' can be hidden
> from the kernel, and all code using it left out.
>* ext2 and ext3 file system code will have to be disabled, but that's
> file since ext4 can mount old file systems.
>* until xfs gets extended, we can also disiable it at build time.
>
>For most users, we probably want to leave all that enabled by
>default until we get much closer to 2038, but a compile time
>option should allow us to test what works or doesn't, and it
>can be set by embedded developers that want to ensure their
>code keeps running for the next few decades.
>
> Arnd
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-01 19:56 ` Arnd Bergmann
2014-06-01 20:26 ` H. Peter Anvin@ 2014-06-02 1:36 ` Nicolas Pitre
2014-06-02 2:22 ` Dave Chinner
2014-06-02 10:56 ` Arnd Bergmann1 sibling, 2 replies; 124+ messages in thread
From: Nicolas Pitre @ 2014-06-02 1:36 UTC (permalink / raw)
To: Arnd Bergmann
Cc: H. Peter Anvin, Dave Chinner, linux-kernel, linux-arch, joseph,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, xfs
On Sun, 1 Jun 2014, Arnd Bergmann wrote:
> On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > > readonly if not in reality than in practice.
> >
> > For those (legacy) filesystems with a signed 32-bit timestamps, any
> > attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
> > (silently) clamped to 0x7fffffff and that value (the last representable
> > time) used as an overflow indicator. The filesystem driver should
> > convert that value into a corresponding overflow value for whatever
> > kernel internal time representation being used when read back, and this
> > should be propagated up to user space. It should not be a hard error
> > otherwise, as you rightfully stated, everything non read-only would come
> > to a halt on that day.
>
> I don't think there is much of a difference between not being able to
> write at all and all newly written files having the same timestamp,
> causing random things to break differently.
Well, in one case you have a crash certitude. In the other case you have
some probability that your system might still be usable.
> The clamp to the maximum supported time stamp sounds like a reasonable
> choice for 'utimens' and related syscalls for the case of someone
> setting an arbitrary future date beyond what the file system can
> represent. Then again, I don't see a reason why that shouldn't just
> cause an error to be returned.
Resiliance is better than outright failure.
> For actually running kernels beyond 2038, the best idea I've seen so
> far is to disallow all broken code at compile time. I don't see
> a choice but to audit the entire kernel for invalid uses on both
> 32 and 64 bit in the next few years. A lot of code will get changed
> in the process so we can actually keep running 32-bit kernels and
> file systems, but other code will likely go away:
>
> * any system calls that pass a time_t, timeval or timespec on
> 32-bit systems return -ENOSYS, to ensure all user land uses
> the replacements we will put into place
> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> from the kernel, and all code using it left out.
> * ext2 and ext3 file system code will have to be disabled, but that's
> file since ext4 can mount old file systems.
Syscalls and libs can be "fixed". Existing filesystem content might
not. So if you need to mount some old media in read-write mode after
2038 and that happens to content an ext2 or similarly limited filesystem
then it'd better just "work". Having the kernel refuse to modify the
filesystem would be unacceptable.
Nicolas
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 1:36 ` Nicolas Pitre@ 2014-06-02 2:22 ` Dave Chinner
2014-06-02 7:09 ` Geert Uytterhoeven
2014-06-02 10:56 ` Arnd Bergmann1 sibling, 1 reply; 124+ messages in thread
From: Dave Chinner @ 2014-06-02 2:22 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Arnd Bergmann, H. Peter Anvin, linux-kernel, linux-arch, joseph,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, xfs
On Sun, Jun 01, 2014 at 09:36:26PM -0400, Nicolas Pitre wrote:
> On Sun, 1 Jun 2014, Arnd Bergmann wrote:
> > On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > For actually running kernels beyond 2038, the best idea I've seen so
> > far is to disallow all broken code at compile time. I don't see
> > a choice but to audit the entire kernel for invalid uses on both
> > 32 and 64 bit in the next few years. A lot of code will get changed
> > in the process so we can actually keep running 32-bit kernels and
> > file systems, but other code will likely go away:
> >
> > * any system calls that pass a time_t, timeval or timespec on
> > 32-bit systems return -ENOSYS, to ensure all user land uses
> > the replacements we will put into place
> > * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> > from the kernel, and all code using it left out.
> > * ext2 and ext3 file system code will have to be disabled, but that's
> > file since ext4 can mount old file systems.
>
> Syscalls and libs can be "fixed". Existing filesystem content might
> not. So if you need to mount some old media in read-write mode after
> 2038 and that happens to content an ext2 or similarly limited filesystem
> then it'd better just "work". Having the kernel refuse to modify the
> filesystem would be unacceptable.
We can already tell the VFS/filesystems not to update timestamps:
inode->i_flags |= S_NOATIME | S_NOCMTIME;
Just enforce that everywhere (i.e. notify_change()) rather than just
on the IO path and the "legacy filesystem timestamp" problem is
"solved".
New interfaces need to return errors when an out-of-range parameter
is set. And right now, >epoch dates are out of range for most
filesystems, and so we need to handle that condition appropriately.
Silent date overflow == filesystem corruption, and as such I'm going
to error out such conditions in the filesystem regardless of what
the userspace API says.
Filesystems place all sorts of userspace visible limits on storage -
ever tried to create a file >16TB on ext4? The on-disk format
doesn't support it, so it returns an out of range error (E2BIG, I
think) if you try. XFS, OTOH, handles this just fine and so it
continues to work. It's exactly the same with timestamps - there's a
physical limit to what can sanely be stored in any given filesystem
and it's an *error condition* to go beyond that limit....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 1:36 ` Nicolas Pitre
2014-06-02 2:22 ` Dave Chinner@ 2014-06-02 10:56 ` Arnd Bergmann
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 15:04 ` Chuck Lever1 sibling, 2 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 10:56 UTC (permalink / raw)
To: Nicolas Pitre
Cc: H. Peter Anvin, Dave Chinner, linux-kernel, linux-arch, joseph,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, xfs
On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
>
> > For actually running kernels beyond 2038, the best idea I've seen so
> > far is to disallow all broken code at compile time. I don't see
> > a choice but to audit the entire kernel for invalid uses on both
> > 32 and 64 bit in the next few years. A lot of code will get changed
> > in the process so we can actually keep running 32-bit kernels and
> > file systems, but other code will likely go away:
> >
> > * any system calls that pass a time_t, timeval or timespec on
> > 32-bit systems return -ENOSYS, to ensure all user land uses
> > the replacements we will put into place
> > * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> > from the kernel, and all code using it left out.
> > * ext2 and ext3 file system code will have to be disabled, but that's
> > file since ext4 can mount old file systems.
>
> Syscalls and libs can be "fixed". Existing filesystem content might
> not. So if you need to mount some old media in read-write mode after
> 2038 and that happens to content an ext2 or similarly limited filesystem
> then it'd better just "work". Having the kernel refuse to modify the
> filesystem would be unacceptable.
I think you misunderstood what I suggested: the intent is to avoid
seeing things break in 2038 by making them break much earlier. We have
a solution for ext2 file systems, it's called ext4, and we just need
to ensure that everybody knows they have to migrate eventually.
At some point before the mid 2030ies, you should no longer be able to
build a kernel that has support for ext2 or any other module that will
run into bugs later. Until then (rather sooner than later), I'd like
to get to the point where you can choose whether to include those
modules at build time or not, and then get everybody to turn off that
option and fix the bugs they run into. You wouldn't need that for a
2014-generation long-term support disto (rhel 7, sles 12, debian 7,
ubuntu 14.04, ...), but perhaps for the next generation, or the
one after that.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-01 20:26 ` H. Peter Anvin@ 2014-06-02 11:02 ` Arnd Bergmann0 siblings, 0 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 11:02 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Nicolas Pitre, Dave Chinner, linux-kernel, linux-arch, joseph,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, xfs
On Sunday 01 June 2014 13:26:03 H. Peter Anvin wrote:
> Perhaps we should make this a kernel command line option instead, with the
> settings: error out on outside the standard window, or a date indicating the
> earliest date that should be recognized and do windowing (0 for no windowing,
> 1970 for retconning the Unix epoch as unsigned...)
What's wrong with compile-time errors? We have a pretty good understanding
of how time values are passed in the kernel, and we know they will all break
in 2038 for 32-bit kernels unless we do something about it.
> But again, the kernel is probably the least problem here...
I agree the glibc side is harder than this, but we have to get the kernel
into shape first (at the minimum we have to do the APIs), and there is enough
work to do here.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 0:28 ` Dave Chinner@ 2014-06-02 11:35 ` Roger Willcocks
2014-06-02 11:43 ` Arnd Bergmann1 sibling, 0 replies; 124+ messages in thread
From: Roger Willcocks @ 2014-06-02 11:35 UTC (permalink / raw)
To: Dave Chinner
Cc: Arnd Bergmann, linux-arch, linux-kernel, lftan, hch, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Mon, 2014-06-02 at 10:28 +1000, Dave Chinner wrote:
>
> The 32 bit second counters in timestamps are too small to represent
> time beyond the unix epoch (jan 2038) correctly. Extend the on-disk
> format for a timestamp to include an 8-bit epoch counter so that we
> can extend time for up to 255 Unix epochs. This should be good for
> representing timestamps from 1970 to somewhere around 19,000 A.D....
>
I assume you're using an 'epoch' variable and not simply using the
padding byte as an eight-bit prefix to the existing 32-bit counter
because the existing counter is signed ?
For long term sanity it might make more sense for the eight-bit value to
be a simple (sign-extended) prefix from 1970.
So if the feature bit is set it's a 40-bit signed time, which is good
for 1970 +/- 17400 years or so.
--
Roger
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 0:28 ` Dave Chinner
2014-06-02 11:35 ` Roger Willcocks@ 2014-06-02 11:43 ` Arnd Bergmann
2014-06-03 0:32 ` Dave Chinner1 sibling, 1 reply; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 11:43 UTC (permalink / raw)
To: Dave Chinner
Cc: H. Peter Anvin, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > all file systems at least times until 2106, because they treat
> > > the on-disk value as unsigned on 64-bit systems, or they use
> > > a completely different representation. My guess is that somebody
> > > earlier spent a lot of work on making that happen.
> > >
> > > The exceptions are:
> > >
> > > * exofs uses signed values, which can probably be changed to be
> > > consistent with the others.
> > > * isofs has a bug that limits it until 2027 on architectures with
> > > a signed 'char' type (otherwise it's 2155).
> > > * udf can represent times for many thousands of years through a
> > > 16-bit year representation, but the code to convert to epoch
> > > uses a const array that ends at 2038.
> > > * afs uses signed seconds and can probably be fixed
> > > * coda relies on user space time representation getting passed
> > > through an ioctl.
> > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > where they really use signed.
> > >
> > > I was confused about XFS since I didn't noticed that there are
> > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> >
> > You've missed an awful lot more than just the implications for the
> > core kernel code.
> >
> > There's a good chance such changes propagate to APIs elsewhere in
> > the filesystems, because something you haven't realised is that XFS
> > effectively exposes the on-disk timestamp format directly to
> > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > by the online defragmenter.
I really didn't look at them at all, as ioctl is very late on my
mental list of things to change. I do realize that a lot of drivers
and file systems do have ioctls that pass time values and we need to
address them one by one.
I just looked at the ioctls you mentioned but don't see how open-by-handle
is affected by this. Can you point me to what you mean?
> Just to put that in context, here's the kernel patch to add extended
> epoch support to XFS. It's completely untested as I haven't done any
> userspace code changes to enable the feature. However, it should
> give you an indication of how far the simple act of changing the
> kernel time representation spread through the filesystem. This does
> not include any of the VFS infrastructure to specifying the range of
> supported timestamps. It survives some smoke testing, but dies when
> the online defragmenter starts using the bulkstat and swap extent
> ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
> probably don't have that all sorted correctly yet...
>
> To test extended epoch support, however, I need to some fstests that
> define and validate the behaviour of the new syscalls - until we get
> those we can't validate that the filesystem follows the spec
> properly. I also suspect we are going to need an interface to query
> the supported range of timestamps from a filesystem so that we can
> test boundary conditions in an automated fashion....
Thanks a lot for having an initial look at this yourself!
I'd still consider the two problems largely orthogonal. My patch set
(at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
more like 64-bit kernels regarding inode time stamps, which does
impact all the file systems that the a 64-bit time or the NFS
unsigned epoch (1970-2106), while your patch extends the file
system internal epoch (1901-2038 for XFS) so it can be used by
anything that knows how to handle larger than 32-bit second values
(either 64-bit kernel or 32-bit with inode_time patch).
> diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
> index 623bbe8..79f94722 100644
> --- a/fs/xfs/xfs_dinode.h
> +++ b/fs/xfs/xfs_dinode.h
> @@ -21,11 +21,53 @@
> #define XFS_DINODE_MAGIC 0x494e /* 'IN' */
> #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3)
>
> +/*
> + * Inode timestamps get more complex when we consider supporting times beyond
> + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support
> + * more than a single extension by playing sign games, and that is still not
> + * reliable. We also can't extend the timestamp structure because there is no
> + * free space around them in the on-disk inode.
> + *
> + * Hence the simplest thing to do is to add an epoch counter for each timestamp
> + * in the inode. This can be a single byte for each timestamp and make use of
> + * a hole we currently pad. This gives us another 255 epochs range for the
> + * timestamps, but requires a superblock feature bit to indicate that these
> + * fields have meaning and can be non-zero.
Nice trick!
> +static inline __uint8_t
> +xfs_timestamp_epoch(
> + struct timespec *time)
> +{
> + /* will be zero until the extended struct inode_time is introduced */
> + return 0;
> +}
> +
> +static inline __int32_t
> +xfs_timestamp_sec(
> + struct timespec *time)
> +{
> + return time->tv_sec;
> +}
> +
> +static inline __kernel_time_t
> +xfs_inode_time_from_epoch(
> + __uint8_t epoch,
> + __int32_t seconds)
> +{
> + /* need to handle non-zero epoch when struct inode_time is introduced */
> + ASSERT(epoch == 0);
> + return seconds;
> +}
Why don't you already implement epoch conversion for 64-bit kernels that
are able to represent the time today? This is how ext4 does it (I mean
the sizeof() trick, not the bit stuffing they do):
static inline __le32 ext4_encode_extra_time(struct inode_time *time)
{
return cpu_to_le32((sizeof(time->tv_sec) > 4 ?
(time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK));
}
static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra)
{
if (sizeof(time->tv_sec) > 4)
time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
<< 32;
time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
}
I guess if there is general agreement on introducing 'struct inode_time',
we can skip that intermediate step.
> @@ -509,8 +509,11 @@ xfs_sb_has_ro_compat_feature(
> }
>
> #define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */
> +#define XFS_SB_FEAT_INCOMPAT_EPOCH (1 << 1) /* Time beyond 2038 */
> #define XFS_SB_FEAT_INCOMPAT_ALL \
> - (XFS_SB_FEAT_INCOMPAT_FTYPE)
> + (XFS_SB_FEAT_INCOMPAT_FTYPE | \
> + XFS_SB_FEAT_INCOMPAT_EPOCH | \
> + 0)
>
> #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
How does this flag get set? Do you have to manually change it in the
superblock? Since most of the time I'd suspect you wouldn't actually
use it for the foreseeable future, would it make sense to have a mount
option that allows it to be set, but doesn't actually change the
superblock until the first inode gets written with a nonzero epoch?
That way, you'd still be able to mount it with an older kernel but
also be forward compatible with time moving on.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 10:56 ` Arnd Bergmann@ 2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann
` (2 more replies)
2014-06-02 15:04 ` Chuck Lever1 sibling, 3 replies; 124+ messages in thread
From: Theodore Ts'o @ 2014-06-02 11:57 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Nicolas Pitre, H. Peter Anvin, Dave Chinner, linux-kernel,
linux-arch, joseph, john.stultz, hch, tglx, geert, lftan,
linux-fsdevel, xfs
On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
>
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
>
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later....
Even for ext4, it's not quite so simple as that. You only have
support for times post 2038 if you are using an inode size > 128
bytes. There are a very, very large number of machines which even
today, are using 128 byte inodes with ext4 for performance reasons.
The vast majority of those machines which I know of can probably move
to 256 byte inodes relatively easily, since hard drive replacement
cycles are order 5-6 years tops, so I'm not that concerned, but it
just goes to show this is a very complicated problem.
And even if we're talking about flash and embedded devices, the good
news is if you assume that 10 years is enough time for people to
update their embedded OS builds, and that the vast majority of
deployed devices will probably only be in service for 10-15 years, we
do have enough time to make file system format changes, although
admittedly we can't afford to dilly-dally.
Regards,
- Ted
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 11:57 ` Theodore Ts'o@ 2014-06-02 12:38 ` Arnd Bergmann
2014-06-02 13:15 ` Theodore Ts'o
2014-06-02 12:52 ` Arnd Bergmann
2014-06-02 14:52 ` H. Peter Anvin2 siblings, 1 reply; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 12:38 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nicolas Pitre, H. Peter Anvin, Dave Chinner, linux-kernel,
linux-arch, joseph, john.stultz, hch, tglx, geert, lftan,
linux-fsdevel, xfs
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> >
> > I think you misunderstood what I suggested: the intent is to avoid
> > seeing things break in 2038 by making them break much earlier. We have
> > a solution for ext2 file systems, it's called ext4, and we just need
> > to ensure that everybody knows they have to migrate eventually.
> >
> > At some point before the mid 2030ies, you should no longer be able to
> > build a kernel that has support for ext2 or any other module that will
> > run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.
Ok, I see.
I also now noticed this comment above EXT4_FITS_IN_INODE():
"For new inodes we always reserve enough space for the kernel's known
extended fields, but for inodes created with an old kernel this might
not have been the case. None of the extended inode fields is critical
for correct filesystem operation."
Do we have to worry about this for inodes that contain extended
attributes and that get updated after 2038?
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann@ 2014-06-02 12:52 ` Arnd Bergmann
2014-06-02 13:07 ` Theodore Ts'o
2014-06-02 14:52 ` H. Peter Anvin2 siblings, 1 reply; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 12:52 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nicolas Pitre, H. Peter Anvin, Dave Chinner, linux-kernel,
linux-arch, joseph, john.stultz, hch, tglx, geert, lftan,
linux-fsdevel, xfs
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> >
> > I think you misunderstood what I suggested: the intent is to avoid
> > seeing things break in 2038 by making them break much earlier. We have
> > a solution for ext2 file systems, it's called ext4, and we just need
> > to ensure that everybody knows they have to migrate eventually.
> >
> > At some point before the mid 2030ies, you should no longer be able to
> > build a kernel that has support for ext2 or any other module that will
> > run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.
One stupid question about the current code:
static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra)
{
if (sizeof(time->tv_sec) > 4)
time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
<< 32;
time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
}
#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode) \
do { \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \
(einode)->xtime.tv_sec = \
(signed)le32_to_cpu((raw_inode)->xtime); \
else \
(einode)->xtime.tv_sec = 0; \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra)) \
ext4_decode_extra_time(&(einode)->xtime, \
raw_inode->xtime ## _extra); \
else \
(einode)->xtime.tv_nsec = 0; \
} while (0)
For a time between 2038 and 2106, this looks like xtime.tv_sec is
negative when ext4_decode_extra_time gets called, so the '|=' operator
doesn't actually do anything. Shouldn't that be '+='?
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 12:52 ` Arnd Bergmann@ 2014-06-02 13:07 ` Theodore Ts'o
2014-06-02 15:01 ` Arnd Bergmann0 siblings, 1 reply; 124+ messages in thread
From: Theodore Ts'o @ 2014-06-02 13:07 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Nicolas Pitre, H. Peter Anvin, Dave Chinner, linux-kernel,
linux-arch, joseph, john.stultz, hch, tglx, geert, lftan,
linux-fsdevel, xfs
Yes, there are some ongoing dicussions about changing the post-2038
encoding of the timestamp in ext4, which is why this hasn't been fixed
yet. The main thing that's been missing is time for me to review the
patches, and a good way of writing regression tests that will work (or
at least not fail) on build environments with a 32-bit time_t and
32-bit-only capable versions of functions such as gmtime(3).
And given current discussions, I may want to think about some kind of
superblock flag to allow the use of a 32-bit unsigned encoding for
file systems using a 128-byte inode, with a way of setting that flag
after scanning the file system to make sure there are no times that
are previous to January 1, 1970. (Or more generally, allow any epoch
to be defined using a 64-bit time_t offset stored in the superblock...)
Cheers,
- Ted
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 12:38 ` Arnd Bergmann@ 2014-06-02 13:15 ` Theodore Ts'o0 siblings, 0 replies; 124+ messages in thread
From: Theodore Ts'o @ 2014-06-02 13:15 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Nicolas Pitre, H. Peter Anvin, Dave Chinner, linux-kernel,
linux-arch, joseph, john.stultz, hch, tglx, geert, lftan,
linux-fsdevel, xfs
On Mon, Jun 02, 2014 at 02:38:09PM +0200, Arnd Bergmann wrote:
>
> "For new inodes we always reserve enough space for the kernel's known
> extended fields, but for inodes created with an old kernel this might
> not have been the case. None of the extended inode fields is critical
> for correct filesystem operation."
>
> Do we have to worry about this for inodes that contain extended
> attributes and that get updated after 2038?
In practice, the extended timestamps was one of the first things added
to ext4, so the vast majority of ext4 file systems with inode sizes >
128 bytes will have room for the extended timestamps. There are some
legacy ext3 file systems with 256-byte inodes (enabled for fast
sotrage of SELinux xattrs) that in theory, could have been converted
to ext4 and had enough xattrs so that the extended timestamps couldn't
be added. That would be a vanishingly small use case, and in
practice, it's not likely to be the case for the embedded market.
I could imagine someone worrying about file systems originally
formatted using RHEL 4 post-2038 (perhaps running in a VM), but I
don't work for IBM any more, and hopefully even IBM would just tell
such customers that they need to suck it up, and do a
backup/reformat/restore pass.
Cheers,
- Ted
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-05-31 5:54 ` Dave Chinner
2014-05-31 8:41 ` H. Peter Anvin@ 2014-06-02 14:00 ` Joseph S. Myers1 sibling, 0 replies; 124+ messages in thread
From: Joseph S. Myers @ 2014-06-02 14:00 UTC (permalink / raw)
To: Dave Chinner
Cc: H. Peter Anvin, Arnd Bergmann, linux-kernel, linux-arch,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, xfs
On Sat, 31 May 2014, Dave Chinner wrote:
> If we are changing the in-kernel timestamp to have a greater dynamic
> range that anything we current support on disk, then we need support
> for all filesystems for similar translation and constraint. The
> filesystems need to be able to tell the kernel what they timestamp
> range they support, and then the kernel needs to follow those
> guidelines. And if the filesystem is mounted on a kernel that
> doesn't support the current filesystem's timestamp format, then at
> minimum that filesystem cannot do anything that writes a
> timestamp....
>
> Put simply: the filesystem defines the timestamp range that can be
> used safely, not the userspace API. If the filesystem can't support
> the date it is handed then that is an out-of-range error. Since
> when have we accepted that it's OK to handle out-of-range data with
> silent overflows or corruption of the data that we are attempting to
> store? We're defining a new API to support a wider date range -
> there is nothing that prevents us from saying ERANGE can be returned
> to a timestamp that the file cannot store correctly....
I don't see anything new about this issue. All problems that could arise
from the kernel being able to represent a timestamp some filesystems can't
are problems that already apply with 64-bit kernels using 64-bit time_t
internally. So while as part of Y2038-preparedness we do need a clear
understanding of which filesystems have what timestamp limits and what
happens with timestamps beyond those limits, I think this is a separate
strand of the problem - one that applies to both 32-bit and 64-bit systems
- from the more general issue for 32-bit systems.
--
Joseph S. Myers
joseph@codesourcery.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 11:57 ` Theodore Ts'o
2014-06-02 12:38 ` Arnd Bergmann
2014-06-02 12:52 ` Arnd Bergmann@ 2014-06-02 14:52 ` H. Peter Anvin2 siblings, 0 replies; 124+ messages in thread
From: H. Peter Anvin @ 2014-06-02 14:52 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Arnd Bergmann, Nicolas Pitre, Dave Chinner, linux-kernel,
linux-arch, joseph, john.stultz, hch, tglx, geert, lftan,
linux-fsdevel, xfs
> On Jun 2, 2014, at 4:57, "Theodore Ts'o" <tytso@mit.edu> wrote:
>
>> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
>>
>> I think you misunderstood what I suggested: the intent is to avoid
>> seeing things break in 2038 by making them break much earlier. We have
>> a solution for ext2 file systems, it's called ext4, and we just need
>> to ensure that everybody knows they have to migrate eventually.
>>
>> At some point before the mid 2030ies, you should no longer be able to
>> build a kernel that has support for ext2 or any other module that will
>> run into bugs later....
>
> Even for ext4, it's not quite so simple as that. You only have
> support for times post 2038 if you are using an inode size > 128
> bytes. There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
>
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.
>
> And even if we're talking about flash and embedded devices, the good
> news is if you assume that 10 years is enough time for people to
> update their embedded OS builds, and that the vast majority of
> deployed devices will probably only be in service for 10-15 years, we
> do have enough time to make file system format changes, although
> admittedly we can't afford to dilly-dally.
I have a number of file systems older than any device they are sitting on. RAID allows individual disks to be swapped out, and when all disks have been swapped out, extend the file system online. The system doesn't even have to be taken offline in the process if it is possible to physically get to the drives with the system powered (e.g. hot plug bays), which is really damned nice.
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 13:07 ` Theodore Ts'o@ 2014-06-02 15:01 ` Arnd Bergmann0 siblings, 0 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 15:01 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Nicolas Pitre, H. Peter Anvin, Dave Chinner, linux-kernel,
linux-arch, joseph, john.stultz, hch, tglx, geert, lftan,
linux-fsdevel, xfs
On Monday 02 June 2014 09:07:00 Theodore Ts'o wrote:
> Yes, there are some ongoing dicussions about changing the post-2038
> encoding of the timestamp in ext4, which is why this hasn't been fixed
> yet. The main thing that's been missing is time for me to review the
> patches, and a good way of writing regression tests that will work (or
> at least not fail) on build environments with a 32-bit time_t and
> 32-bit-only capable versions of functions such as gmtime(3).
>
> And given current discussions, I may want to think about some kind of
> superblock flag to allow the use of a 32-bit unsigned encoding for
> file systems using a 128-byte inode, with a way of setting that flag
> after scanning the file system to make sure there are no times that
> are previous to January 1, 1970. (Or more generally, allow any epoch
> to be defined using a 64-bit time_t offset stored in the superblock...)
FWIW, I've gone through the other file system implementations once
more. The most common pattern I've encountered is to have a read_inode
function with
inode->i_mtime = le32_to_cpu(raw_inode->mtime);
which results in interpreting the time as 'signed' on 32-bit
kernels, but as 'unsigned' on 64-bit kernels. This could have been
done intentionally to extend the valid time range to 2106 on 64-bit
kernels, but it seems more likely that the code was written with
no thought given to 64-bit time_t at all. I see this pattern on
p9fs (old protocol only), afs, bfs, ceph, efs, freevxfs, hpfs, jffs2,
jfs, minix, nfsv2/v3 (this was clearly intentional and is
spelled out in the RFC), qnx4, qnx6, reiserfs, squashfs, sysv,
and ufs (protocol version 1 only).
The other behavior I see is to treat the on-disk 32-bit value
as signed on both 32-bit and 64-bit kernels:
inode->i_mtime = (signed)le32_to_cpu(raw_inode->mtime);
this seems to be done intentionally in all cases, to maintain
compatibility between 32-bit and 64-bit kernels, but it's
relatively rare: exofs, ext2/3/4 (good old inodes) and xfs
are the only ones doing this.
In case of ext2/3/4, the sign handlign was introduced here:
http://www.spinics.net/lists/linux-ext4/msg01758.html
exofs and xfs seem to have done it like this for all of git
history.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 10:56 ` Arnd Bergmann
2014-06-02 11:57 ` Theodore Ts'o@ 2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o
` (2 more replies)1 sibling, 3 replies; 124+ messages in thread
From: Chuck Lever @ 2014-06-02 15:04 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Nicolas Pitre, H. Peter Anvin, Dave Chinner, LKML Kernel,
linux-arch, joseph, john.stultz, Christoph Hellwig, tglx, geert,
lftan, linux-fsdevel, xfs, Linux NFS Mailing List
On Jun 2, 2014, at 6:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
>>
>>> For actually running kernels beyond 2038, the best idea I've seen so
>>> far is to disallow all broken code at compile time. I don't see
>>> a choice but to audit the entire kernel for invalid uses on both
>>> 32 and 64 bit in the next few years. A lot of code will get changed
>>> in the process so we can actually keep running 32-bit kernels and
>>> file systems, but other code will likely go away:
>>>
>>> * any system calls that pass a time_t, timeval or timespec on
>>> 32-bit systems return -ENOSYS, to ensure all user land uses
>>> the replacements we will put into place
>>> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
>>> from the kernel, and all code using it left out.
>>> * ext2 and ext3 file system code will have to be disabled, but that's
>>> file since ext4 can mount old file systems.
>>
>> Syscalls and libs can be "fixed". Existing filesystem content might
>> not. So if you need to mount some old media in read-write mode after
>> 2038 and that happens to content an ext2 or similarly limited filesystem
>> then it'd better just "work". Having the kernel refuse to modify the
>> filesystem would be unacceptable.
>
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
>
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later. Until then (rather sooner than later), I'd like
> to get to the point where you can choose whether to include those
> modules at build time or not, and then get everybody to turn off that
> option and fix the bugs they run into. You wouldn't need that for a
> 2014-generation long-term support disto (rhel 7, sles 12, debian 7,
> ubuntu 14.04, ...), but perhaps for the next generation, or the
> one after that.
Im wondering what should be done about NFS. A solution for NFS should
match any scheme that is considered for local file systems, IMO.
NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
(See the definition of nfstime3 in RFC 1813).
NFSv4 uses a signed 64-bit value where zero represents midnight UTC
on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
the definition of nfstime4 in RFC 5661).
The NFSv4 protocol is probably not problematic, and NFSv3 should be out
of the picture by 2038. But if changes are planned for dealing _now_
with timestamp issues, compatibility with NFSv3 is a consideration.
It is already the case that, via NFSv3, the Linux NFS client transmits
timestamps earlier than 1970 as large positive numbers. Try this with
xfstests generic/258.
Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and
timestamps larger than can be represented in an unsigned 32-bit field
and return an immediate error to the requesting application (like EINVAL).
If the Linux NFS server encounters a local file with a timestamp that
cannot be represented via a u32, should it also return NFS3ERR_INVAL?
RFC 1813 does not provide guidance on the behavior nor does it suggest
a particular error status code. The Solaris 11 server appears to return
NFS3ERR_INVAL in this case.
An alternative would be to cap the timestamps transmitted via NFSv3 by
Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
timestamp is transmitted as UINT_MAX.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 15:04 ` Chuck Lever@ 2014-06-02 15:31 ` Theodore Ts'o
2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:52 ` Arnd Bergmann
2014-06-02 18:58 ` Roger Willcocks2 siblings, 1 reply; 124+ messages in thread
From: Theodore Ts'o @ 2014-06-02 15:31 UTC (permalink / raw)
To: Chuck Lever
Cc: Arnd Bergmann, Nicolas Pitre, H. Peter Anvin, Dave Chinner,
LKML Kernel, linux-arch, joseph, john.stultz, Christoph Hellwig,
tglx, geert, lftan, linux-fsdevel, xfs, Linux NFS Mailing List
On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
>
> An alternative would be to “cap” the timestamps transmitted via NFSv3 by
> Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
> timestamp is transmitted as UINT_MAX.
I wonder if it would make sense to try to promulgate via the Austin
group, and possibly the C standards committee the concept of a bit
pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
unknown", or "time indefinite" or "we couldn't encode the time".
We would then teach gmtime(3) and asctime(3) to print some appropriate
message, and we could teach programs like find (with the -mtime)
option, make, tmpwatch, et. al., that they can't make any presumption
about the comparibility of any timestamp which has a value of
TIME_UNDEFINIED.
It would be problematic for time(2) or gettimeofday(2) to return
TIME_UNDEFINED, since there are programs that care about time ticking
forward, but I could imagine a new interface which would be permitted
to return a flag indicating that we don't know the current time
(because the CMOS battery had run down, etc.) so instead we're going
to be counting the number of seconds since the system was booted.
- Ted
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 15:31 ` Theodore Ts'o@ 2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:50 ` Arnd Bergmann
2014-06-02 22:29 ` Theodore Ts'o0 siblings, 2 replies; 124+ messages in thread
From: H. Peter Anvin @ 2014-06-02 17:12 UTC (permalink / raw)
To: Theodore Ts'o, Chuck Lever, Arnd Bergmann, Nicolas Pitre,
Dave Chinner, LKML Kernel, linux-arch, joseph, john.stultz,
Christoph Hellwig, tglx, geert, lftan, linux-fsdevel, xfs,
Linux NFS Mailing List
On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
>
> I wonder if it would make sense to try to promulgate via the Austin
> group, and possibly the C standards committee the concept of a bit
> pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> unknown", or "time indefinite" or "we couldn't encode the time".
>
(time_t)-1 already has this meaning for some calls (e.g. time(2)).
However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
something similar applies to all possible bit patterns, certainly within
the range of an int.
> We would then teach gmtime(3) and asctime(3) to print some appropriate
> message, and we could teach programs like find (with the -mtime)
> option, make, tmpwatch, et. al., that they can't make any presumption
> about the comparibility of any timestamp which has a value of
> TIME_UNDEFINIED.
>
> It would be problematic for time(2) or gettimeofday(2) to return
> TIME_UNDEFINED, since there are programs that care about time ticking
> forward, but I could imagine a new interface which would be permitted
> to return a flag indicating that we don't know the current time
> (because the CMOS battery had run down, etc.) so instead we're going
> to be counting the number of seconds since the system was booted.
This assumes that we actually know that that is the case, which may be
an aggressive assumption.
-hpa
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 17:12 ` H. Peter Anvin@ 2014-06-02 18:50 ` Arnd Bergmann
2014-06-02 22:29 ` Theodore Ts'o1 sibling, 0 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 18:50 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Theodore Ts'o, Chuck Lever, Nicolas Pitre, Dave Chinner,
LKML Kernel, linux-arch, joseph, john.stultz, Christoph Hellwig,
tglx, geert, lftan, linux-fsdevel, xfs, Linux NFS Mailing List
On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote:
> On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
> >
> > I wonder if it would make sense to try to promulgate via the Austin
> > group, and possibly the C standards committee the concept of a bit
> > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> > unknown", or "time indefinite" or "we couldn't encode the time".
> >
>
> (time_t)-1 already has this meaning for some calls (e.g. time(2)).
> However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
> something similar applies to all possible bit patterns, certainly within
> the range of an int.
Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means
"Sun Feb 7 07:28:15 CET 2106", and that is much harder to distinguish
from a real future date.
If we had the choice, I'd go for something like 1, i.e.
"Thu Jan 1 01:00:01 CET 1970".
> > We would then teach gmtime(3) and asctime(3) to print some appropriate
> > message, and we could teach programs like find (with the -mtime)
> > option, make, tmpwatch, et. al., that they can't make any presumption
> > about the comparibility of any timestamp which has a value of
> > TIME_UNDEFINIED.
> >
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
>
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.
It's harder for time(2), but for the inode case, we can definitely
detect when the file system specific representation overflows
or underflows, which may be be at a number of very different points
of time.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o@ 2014-06-02 18:52 ` Arnd Bergmann
2014-06-02 18:58 ` Roger Willcocks2 siblings, 0 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 18:52 UTC (permalink / raw)
To: Chuck Lever
Cc: Nicolas Pitre, H. Peter Anvin, Dave Chinner, LKML Kernel,
linux-arch, joseph, john.stultz, Christoph Hellwig, tglx, geert,
lftan, linux-fsdevel, xfs, Linux NFS Mailing List
On Monday 02 June 2014 11:04:23 Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
>
> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
>
> NFSv4 uses a signed 64-bit value where zero represents midnight UTC
> on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
> the definition of nfstime4 in RFC 5661).
>
> The NFSv4 protocol is probably not problematic, and NFSv3 should be out
> of the picture by 2038. But if changes are planned for dealing _now_
> with timestamp issues, compatibility with NFSv3 is a consideration.
>
> It is already the case that, via NFSv3, the Linux NFS client transmits
> timestamps earlier than 1970 as large positive numbers. Try this with
> xfstests generic/258.
If I read the code correctly, a pre-1970 timestamp will be sent as
a large unsigned integer, but received as a post-2038 timestamp on
64-bit kernels, both in the nfs client and server code.
This behavior is clearly wrong, but it's the same bug that we have
in lots of other file systems, and it makes sense to have the
same fix everywhere, at lease the cases where we know what interpretation
we actually want. NFS has the luxury of having an actual specification
saying that the value is unsigned. For most of the legacy file systems,
we can only make a guess at how other OSs would interpret the same
numbers.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 15:04 ` Chuck Lever
2014-06-02 15:31 ` Theodore Ts'o
2014-06-02 18:52 ` Arnd Bergmann@ 2014-06-02 18:58 ` Roger Willcocks
2014-06-02 19:04 ` Chuck Lever2 siblings, 1 reply; 124+ messages in thread
From: Roger Willcocks @ 2014-06-02 18:58 UTC (permalink / raw)
To: Chuck Lever
Cc: Arnd Bergmann, Nicolas Pitre, linux-arch, Linux NFS Mailing List,
LKML Kernel, lftan, Christoph Hellwig, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
>
nfstime3 could be extended by redefining the otherwise unused
nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
seconds field and an unsigned 30-bit nanoseconds field.
This could represent 1970 +/- 272 years.
Servers could indicate they can understand the extended time format by
adding a new FSINFO capability - FSF3_CANSETTIME_EX.
Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
timestamps so old servers would be protected from new clients.
Old clients don't need to be protected from new servers because the
on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
so they're no worse off than they were before.
Arguably the new server ought to clamp out-of-range timestamps before
sending them to old clients but that would need per-client state (and
nfs3 is stateless.)
--
Roger
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 18:58 ` Roger Willcocks@ 2014-06-02 19:04 ` Chuck Lever
2014-06-02 19:10 ` Arnd Bergmann0 siblings, 1 reply; 124+ messages in thread
From: Chuck Lever @ 2014-06-02 19:04 UTC (permalink / raw)
To: Roger Willcocks
Cc: Arnd Bergmann, Nicolas Pitre, linux-arch, Linux NFS Mailing List,
LKML Kernel, lftan, Christoph Hellwig, john.stultz,
H. Peter Anvin, linux-fsdevel, geert, tglx, xfs, joseph
On Jun 2, 2014, at 2:58 PM, Roger Willcocks <roger@filmlight.ltd.uk> wrote:
>
> On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
>
>> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
>> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
>> (See the definition of nfstime3 in RFC 1813).
>>
>
> nfstime3 could be extended by redefining the otherwise unused
> nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> seconds field and an unsigned 30-bit nanoseconds field.
>
> This could represent 1970 +/- 272 years.
>
> Servers could indicate they can understand the extended time format by
> adding a new FSINFO capability - FSF3_CANSETTIME_EX.
>
> Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> timestamps so old servers would be protected from new clients.
You would have to get the IETFs NFSv4 working group to sign off on
this change. Otherwise, Linux would be the only NFSv3 implementation
that supports the extension.
But I suspect the answer youd get is Use NFSv4.
> Old clients don't need to be protected from new servers because the
> on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
> so they're no worse off than they were before.
>
> Arguably the new server ought to clamp out-of-range timestamps before
> sending them to old clients but that would need per-client state (and
> nfs3 is stateless.)
Theres no reliable way in NFSv3 for clients and servers to identify
the software running on the peer.
Practically speaking, you should assume that the NFSv3 protocol is never
going to change.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 13:52 ` Joseph S. Myers@ 2014-06-02 19:19 ` Arnd Bergmann
2014-06-02 19:26 ` H. Peter Anvin
2014-06-02 21:02 ` Joseph S. Myers0 siblings, 2 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 19:19 UTC (permalink / raw)
To: Joseph S. Myers
Cc: linux-kernel, linux-arch, john.stultz, hch, tglx, geert, lftan,
hpa, linux-fsdevel, ceph-devel, cluster-devel, coda, codalist,
fuse-devel, linux-afs, linux-btrfs, linux-cifs, linux-ext4,
linux-f2fs-devel, linux-mtd, linux-nfs, linux-ntfs-dev,
linux-scsi, logfs, ocfs2-devel, reiserfs-devel, samba-technical,
xfs
On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> On Fri, 30 May 2014, Arnd Bergmann wrote:
>
> > a) is this the right approach in general? The previous discussion
> > pointed this way, but there may be other opinions.
>
> The syscall changes seem like the sort of thing I'd expect, although
> patches adding new syscalls or otherwise affecting the kernel/userspace
> interface (as opposed to those relating to an individual filesystem)
> should go to linux-api as well as other relevant lists.
Ok. Sorry about missing linux-api, I confused it with linux-arch, which
may not be as relevant here, except for the one question whether we
actually want to have the new ABI on all 32-bit architectures or only
as an opt-in for those that expect to stay around for another 24 years.
Two more questions for you:
- are you (and others) happy with adding this type of stat syscall
(fstatat64/fstat64) as opposed to the more generic xstat that has
been discussed in the past and that never made it through the bike-
shedding discussion?
- once we have enough buy-in from reviewers to merge this initial
series, should we proceed to define rest of the syscall ABI
(minus driver ioctls) so glibc and kernel can do the conversion
on top of that, or should we better try to do things one syscall
family at a time and actually get the kernel to handle them
correctly internally?
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 19:19 ` Arnd Bergmann@ 2014-06-02 19:26 ` H. Peter Anvin
2014-06-02 19:55 ` Arnd Bergmann
2014-06-02 21:02 ` Joseph S. Myers1 sibling, 1 reply; 124+ messages in thread
From: H. Peter Anvin @ 2014-06-02 19:26 UTC (permalink / raw)
To: Arnd Bergmann, Joseph S. Myers
Cc: linux-kernel, linux-arch, john.stultz, hch, tglx, geert, lftan,
linux-fsdevel, ceph-devel, cluster-devel, coda, codalist,
fuse-devel, linux-afs, linux-btrfs, linux-cifs, linux-ext4,
linux-f2fs-devel, linux-mtd, linux-nfs, linux-ntfs-dev,
linux-scsi, logfs, ocfs2-devel, reiserfs-devel, samba-technical,
xfs
On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
>> On Fri, 30 May 2014, Arnd Bergmann wrote:
>>
>>> a) is this the right approach in general? The previous discussion
>>> pointed this way, but there may be other opinions.
>>
>> The syscall changes seem like the sort of thing I'd expect, although
>> patches adding new syscalls or otherwise affecting the kernel/userspace
>> interface (as opposed to those relating to an individual filesystem)
>> should go to linux-api as well as other relevant lists.
>
> Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> may not be as relevant here, except for the one question whether we
> actually want to have the new ABI on all 32-bit architectures or only
> as an opt-in for those that expect to stay around for another 24 years.
>
> Two more questions for you:
>
> - are you (and others) happy with adding this type of stat syscall
> (fstatat64/fstat64) as opposed to the more generic xstat that has
> been discussed in the past and that never made it through the bike-
> shedding discussion?
>
> - once we have enough buy-in from reviewers to merge this initial
> series, should we proceed to define rest of the syscall ABI
> (minus driver ioctls) so glibc and kernel can do the conversion
> on top of that, or should we better try to do things one syscall
> family at a time and actually get the kernel to handle them
> correctly internally?
>
The bit that is really going to hurt is every single ioctl that uses a
timespec.
Honestly, though, I really don't understand the point with "struct
inode_time". It seems like the zeroeth-order thing is to change the
kernel internal version of struct timespec to have a 64-bit time... it
isn't just about inodes. We then should be explicit about the external
uses of time, and use accessors.
-hpa
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 19:26 ` H. Peter Anvin@ 2014-06-02 19:55 ` Arnd Bergmann
2014-06-02 21:57 ` H. Peter Anvin0 siblings, 1 reply; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-02 19:55 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Joseph S. Myers, linux-kernel, linux-arch, john.stultz, hch,
tglx, geert, lftan, linux-fsdevel, ceph-devel, cluster-devel,
coda, codalist, fuse-devel, linux-afs, linux-btrfs, linux-cifs,
linux-ext4, linux-f2fs-devel, linux-mtd, linux-nfs,
linux-ntfs-dev, linux-scsi, logfs, ocfs2-devel, reiserfs-devel,
samba-technical, xfs
On Monday 02 June 2014 12:26:22 H. Peter Anvin wrote:
> On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> >> On Fri, 30 May 2014, Arnd Bergmann wrote:
> >>
> >>> a) is this the right approach in general? The previous discussion
> >>> pointed this way, but there may be other opinions.
> >>
> >> The syscall changes seem like the sort of thing I'd expect, although
> >> patches adding new syscalls or otherwise affecting the kernel/userspace
> >> interface (as opposed to those relating to an individual filesystem)
> >> should go to linux-api as well as other relevant lists.
> >
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
> >
> > Two more questions for you:
> >
> > - are you (and others) happy with adding this type of stat syscall
> > (fstatat64/fstat64) as opposed to the more generic xstat that has
> > been discussed in the past and that never made it through the bike-
> > shedding discussion?
> >
> > - once we have enough buy-in from reviewers to merge this initial
> > series, should we proceed to define rest of the syscall ABI
> > (minus driver ioctls) so glibc and kernel can do the conversion
> > on top of that, or should we better try to do things one syscall
> > family at a time and actually get the kernel to handle them
> > correctly internally?
> >
>
> The bit that is really going to hurt is every single ioctl that uses a
> timespec.
>
> Honestly, though, I really don't understand the point with "struct
> inode_time". It seems like the zeroeth-order thing is to change the
> kernel internal version of struct timespec to have a 64-bit time... it
> isn't just about inodes. We then should be explicit about the external
> uses of time, and use accessors.
I picked these because they are fairly isolated from all other uses,
in particular since inode times are the only things where we really
care about times in the distant past or future (decades away as opposed
to things that happened between boot and shutdown).
For other kernel-internal uses, we may be better off migrating to
a completely different representation, such as nanoseconds since
boot or the architecture specific ktime_t, but this is really something
to decide for each subsystem.
I just tried building an arm32 kernel with a s64 time_t, and that
failed horribly, I get linker errors for missing 64-bit divides
and lots of warnings for code that expects time_t pointers to
functions taking a 'long' or vice versa. I also think the only
way to maintain ABI compatibility is to separate the internal uses
from the interface, which means auditing all code in the end.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 19:19 ` Arnd Bergmann
2014-06-02 19:26 ` H. Peter Anvin@ 2014-06-02 21:02 ` Joseph S. Myers
2014-06-04 15:05 ` Arnd Bergmann1 sibling, 1 reply; 124+ messages in thread
From: Joseph S. Myers @ 2014-06-02 21:02 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-kernel, linux-arch, john.stultz, hch, tglx, geert, lftan,
hpa, linux-fsdevel, ceph-devel, cluster-devel, coda, codalist,
fuse-devel, linux-afs, linux-btrfs, linux-cifs, linux-ext4,
linux-f2fs-devel, linux-mtd, linux-nfs, linux-ntfs-dev,
linux-scsi, logfs, ocfs2-devel, reiserfs-devel, samba-technical,
xfs
On Mon, 2 Jun 2014, Arnd Bergmann wrote:
> Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> may not be as relevant here, except for the one question whether we
> actually want to have the new ABI on all 32-bit architectures or only
> as an opt-in for those that expect to stay around for another 24 years.
For glibc I think it will make the most sense to add the support for
64-bit time_t across all architectures that currently have 32-bit time_t
(with the new interfaces having fallback support to implementation in
terms of the 32-bit kernel interfaces, if the 64-bit syscalls are
unavailable either at runtime or in the kernel headers against which glibc
is compiled - this fallback code will of course need to check for overflow
when passing a time value to the kernel, hopefully with error handling
consistent with whatever the kernel ends up doing when a filesystem can't
support a timestamp). If some architectures don't provide the new
interfaces in the kernel then that will mean the fallback code in glibc
can't be removed until glibc support for those architectures is removed
(as opposed to removing it when glibc no longer supports kernels predating
the kernel support).
> Two more questions for you:
>
> - are you (and others) happy with adding this type of stat syscall
> (fstatat64/fstat64) as opposed to the more generic xstat that has
> been discussed in the past and that never made it through the bike-
> shedding discussion?
I am.
> - once we have enough buy-in from reviewers to merge this initial
> series, should we proceed to define rest of the syscall ABI
> (minus driver ioctls) so glibc and kernel can do the conversion
> on top of that, or should we better try to do things one syscall
> family at a time and actually get the kernel to handle them
> correctly internally?
I don't have any comments on that ordering question.
--
Joseph S. Myers
joseph@codesourcery.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 19:55 ` Arnd Bergmann@ 2014-06-02 21:57 ` H. Peter Anvin
2014-06-03 14:22 ` Arnd Bergmann0 siblings, 1 reply; 124+ messages in thread
From: H. Peter Anvin @ 2014-06-02 21:57 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Joseph S. Myers, linux-kernel, linux-arch, john.stultz, hch,
tglx, geert, lftan, linux-fsdevel, ceph-devel, cluster-devel,
coda, codalist, fuse-devel, linux-afs, linux-btrfs, linux-cifs,
linux-ext4, linux-f2fs-devel, linux-mtd, linux-nfs,
linux-ntfs-dev, linux-scsi, logfs, ocfs2-devel, reiserfs-devel,
samba-technical, xfs
On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
>>
>> The bit that is really going to hurt is every single ioctl that uses a
>> timespec.
>>
>> Honestly, though, I really don't understand the point with "struct
>> inode_time". It seems like the zeroeth-order thing is to change the
>> kernel internal version of struct timespec to have a 64-bit time... it
>> isn't just about inodes. We then should be explicit about the external
>> uses of time, and use accessors.
>
> I picked these because they are fairly isolated from all other uses,
> in particular since inode times are the only things where we really
> care about times in the distant past or future (decades away as opposed
> to things that happened between boot and shutdown).
>
If nothing else, I would expect to be able to set the system time to
weird values for testing. So I'm not so sure I agree with that...
> For other kernel-internal uses, we may be better off migrating to
> a completely different representation, such as nanoseconds since
> boot or the architecture specific ktime_t, but this is really something
> to decide for each subsystem.
Having a bunch of different time representations in the kernel seems
like a real headache...
-hpa
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 17:12 ` H. Peter Anvin
2014-06-02 18:50 ` Arnd Bergmann@ 2014-06-02 22:29 ` Theodore Ts'o
2014-06-02 22:32 ` H. Peter Anvin1 sibling, 1 reply; 124+ messages in thread
From: Theodore Ts'o @ 2014-06-02 22:29 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Chuck Lever, Arnd Bergmann, Nicolas Pitre, Dave Chinner,
LKML Kernel, linux-arch, joseph, john.stultz, Christoph Hellwig,
tglx, geert, lftan, linux-fsdevel, xfs, Linux NFS Mailing List
On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote:
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
>
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.
We won't know if the RTC clock is wrong, true --- but the kernel will
know if (a) the hardware doesn't have RTC clock at all, or if (b) the
RTC clock is ticking some time that can't be encoded using the current
time_t type. So in that case, the fallback would be to be for the
kernel to tick starting with time_t == 0 when the system is initially
booted, and the "time indefinite flag" would be set.
Now assume that we have a new system call, gettimestampofday(2), which
returns a new timestamp structure which has a 64-bit ts_sec field, the
ts_nsec field (ala struct timespec), and a ts_flags field, where the
kernel could signal things like "time invalid", or "time can't be
encoded in the legacy time_t type", or "I'm not sure if the time is
correct" --- i.e., because the RTC battery isn't working.
Not all hardware might be able to support the last, of course, but if
the battery is low, or the system has been exposed to very low
temperatures (or large amounts of cosmic radiation, etc.) the RTC
time may just be plain wrong. No system is going to be perfect, but
it should be possible to make htings better, at for certain classes of
hardware.
And since we are already returning (time_t) -1 in some cases, we might
as well try to make things a bit more formal.
- Ted
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-02 11:43 ` Arnd Bergmann@ 2014-06-03 0:32 ` Dave Chinner
2014-06-03 7:33 ` Arnd Bergmann0 siblings, 1 reply; 124+ messages in thread
From: Dave Chinner @ 2014-06-03 0:32 UTC (permalink / raw)
To: Arnd Bergmann
Cc: H. Peter Anvin, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > > all file systems at least times until 2106, because they treat
> > > > the on-disk value as unsigned on 64-bit systems, or they use
> > > > a completely different representation. My guess is that somebody
> > > > earlier spent a lot of work on making that happen.
> > > >
> > > > The exceptions are:
> > > >
> > > > * exofs uses signed values, which can probably be changed to be
> > > > consistent with the others.
> > > > * isofs has a bug that limits it until 2027 on architectures with
> > > > a signed 'char' type (otherwise it's 2155).
> > > > * udf can represent times for many thousands of years through a
> > > > 16-bit year representation, but the code to convert to epoch
> > > > uses a const array that ends at 2038.
> > > > * afs uses signed seconds and can probably be fixed
> > > > * coda relies on user space time representation getting passed
> > > > through an ioctl.
> > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > > where they really use signed.
> > > >
> > > > I was confused about XFS since I didn't noticed that there are
> > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > >
> > > You've missed an awful lot more than just the implications for the
> > > core kernel code.
> > >
> > > There's a good chance such changes propagate to APIs elsewhere in
> > > the filesystems, because something you haven't realised is that XFS
> > > effectively exposes the on-disk timestamp format directly to
> > > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > > by the online defragmenter.
>
> I really didn't look at them at all, as ioctl is very late on my
> mental list of things to change. I do realize that a lot of drivers
> and file systems do have ioctls that pass time values and we need to
> address them one by one.
>
> I just looked at the ioctls you mentioned but don't see how open-by-handle
> is affected by this. Can you point me to what you mean?
Sorry, I misremembered how some of the XFS open-by-handle code works
in userspace (XFS has a pretty rich open-by-handle ioctl() interface
that predates the kernel syscalls by at least 10 years). Basically
there is code in userspace that uses the information returned from
bulkstat to construct file handles to pass to the open-by-handle
ioctls. xfs_fsr then uses the combination of open-by-handle from the
bulkstat output and the bulkstat output to feed into the swap extent
ioctls....
i.e. the filesystem's idea of what time is is passed to userspace as
an opaque cookie in this case, but it is not used directly by the
open-by-handle interfaces like I implied it was.
> > Just to put that in context, here's the kernel patch to add extended
> > epoch support to XFS. It's completely untested as I haven't done any
> > userspace code changes to enable the feature. However, it should
> > give you an indication of how far the simple act of changing the
> > kernel time representation spread through the filesystem. This does
> > not include any of the VFS infrastructure to specifying the range of
> > supported timestamps. It survives some smoke testing, but dies when
> > the online defragmenter starts using the bulkstat and swap extent
> > ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
> > probably don't have that all sorted correctly yet...
> >
> > To test extended epoch support, however, I need to some fstests that
> > define and validate the behaviour of the new syscalls - until we get
> > those we can't validate that the filesystem follows the spec
> > properly. I also suspect we are going to need an interface to query
> > the supported range of timestamps from a filesystem so that we can
> > test boundary conditions in an automated fashion....
>
> Thanks a lot for having an initial look at this yourself!
>
> I'd still consider the two problems largely orthogonal.
Depends how you look at it. You can't extend the kernel's idea of
time without permanent storage being able to specify the supported
bounds - that's a non-negotiable aspect of introducing extended
epoch timestamp support.
The actual addition of extended timestamp support to each individual
filesystem is orthoganol to the introduction of the struct
inode_time, but doing this addition properly is dependent on the VFS
infrastructure being there in the first place.
> My patch set
> (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> more like 64-bit kernels regarding inode time stamps, which does
> impact all the file systems that the a 64-bit time or the NFS
> unsigned epoch (1970-2106), while your patch extends the file
> system internal epoch (1901-2038 for XFS) so it can be used by
> anything that knows how to handle larger than 32-bit second values
> (either 64-bit kernel or 32-bit with inode_time patch).
Right, but the issue is that 64 bit second counters are broken right
now because most filesystems can't support more than 32 bit values.
So it doesn't matter whether it's 32 bit or 64 bit machines, just
adding explicit support for >32 bit second counters without doing
anything else just extends that brokenness into the indefinite
future.
If we don't fix it now (i.e in the new user API and supporting
infrastructure), then we'll *never be able to fix it* and we'll be
stuck with timestamps that do really weird things when you pass
arbitrary future dates to the kernel.
> > diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
> > index 623bbe8..79f94722 100644
> > --- a/fs/xfs/xfs_dinode.h
> > +++ b/fs/xfs/xfs_dinode.h
> > @@ -21,11 +21,53 @@
> > #define XFS_DINODE_MAGIC 0x494e /* 'IN' */
> > #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3)
> >
> > +/*
> > + * Inode timestamps get more complex when we consider supporting times beyond
> > + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support
> > + * more than a single extension by playing sign games, and that is still not
> > + * reliable. We also can't extend the timestamp structure because there is no
> > + * free space around them in the on-disk inode.
> > + *
> > + * Hence the simplest thing to do is to add an epoch counter for each timestamp
> > + * in the inode. This can be a single byte for each timestamp and make use of
> > + * a hole we currently pad. This gives us another 255 epochs range for the
> > + * timestamps, but requires a superblock feature bit to indicate that these
> > + * fields have meaning and can be non-zero.
>
> Nice trick!
It's a pretty common way of extending the range of a variable for
on-disk formats. The on-disk format is completely disconnected from
the in-memory representation, so it's "easy" to play games like this
within the on-disk format.
If you look closely at ext4, you'll see all the lo/hi variables
where extension of 16->32 bits or 32->48 bits has occurred from
the ext2/3 variable formats... ;)
>
> > +static inline __uint8_t
> > +xfs_timestamp_epoch(
> > + struct timespec *time)
> > +{
> > + /* will be zero until the extended struct inode_time is introduced */
> > + return 0;
> > +}
> > +
> > +static inline __int32_t
> > +xfs_timestamp_sec(
> > + struct timespec *time)
> > +{
> > + return time->tv_sec;
> > +}
> > +
> > +static inline __kernel_time_t
> > +xfs_inode_time_from_epoch(
> > + __uint8_t epoch,
> > + __int32_t seconds)
> > +{
> > + /* need to handle non-zero epoch when struct inode_time is introduced */
> > + ASSERT(epoch == 0);
> > + return seconds;
> > +}
>
> Why don't you already implement epoch conversion for 64-bit kernels that
> are able to represent the time today?
Because I wasn't trying to solve the entire problem, just
demonstrate the infrastructure needed to support extended
timestamps.....
> This is how ext4 does it (I mean
> the sizeof() trick, not the bit stuffing they do):
....
> I guess if there is general agreement on introducing 'struct inode_time',
> we can skip that intermediate step.
Also, I don't like the concept of having filesystems that will work
on 64 bit but not 32 bit machines. Over the past 10 years, we've
managed to remove most of those differences from the VFS and XFS,
so adding new distinctions between 32/64 bit machines is not the
direction I want to head in.
As it is, I'm expecting to do this only after the struct inode_time
and the superblock "time range" infrastructure have been added to
the kernel and VFS. If that change is not made, then we've still
only got 32 bit time....
> > @@ -509,8 +509,11 @@ xfs_sb_has_ro_compat_feature(
> > }
> >
> > #define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */
> > +#define XFS_SB_FEAT_INCOMPAT_EPOCH (1 << 1) /* Time beyond 2038 */
> > #define XFS_SB_FEAT_INCOMPAT_ALL \
> > - (XFS_SB_FEAT_INCOMPAT_FTYPE)
> > + (XFS_SB_FEAT_INCOMPAT_FTYPE | \
> > + XFS_SB_FEAT_INCOMPAT_EPOCH | \
> > + 0)
> >
> > #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
>
> How does this flag get set?
mkfs.xfs
> Do you have to manually change it in the
> superblock? Since most of the time I'd suspect you wouldn't actually
> use it for the foreseeable future, would it make sense to have a mount
> option that allows it to be set, but doesn't actually change the
> superblock until the first inode gets written with a nonzero epoch?
Yes, we could set the flag on the first timestamp that goes beyond
the current epoch, but that has two problems:
1. filesystem silently becomes incompatible with older
kernels so failed upgrade rollbacks become problematic; and
2. It adds unecessary complexity, as this will end up being
the default behaviour for all new filesystems within a year.
Then we end up with a mount option and conversion functions
that never get used but we have to support for years....
> That way, you'd still be able to mount it with an older kernel but
> also be forward compatible with time moving on.
We've got plenty of time to roll this out so I don't see any need
for putting in place temporary support mechanisms that unnecessarily
complicate the code.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-03 0:32 ` Dave Chinner@ 2014-06-03 7:33 ` Arnd Bergmann
2014-06-03 8:41 ` Dave Chinner0 siblings, 1 reply; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-03 7:33 UTC (permalink / raw)
To: Dave Chinner
Cc: H. Peter Anvin, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > > > all file systems at least times until 2106, because they treat
> > > > > the on-disk value as unsigned on 64-bit systems, or they use
> > > > > a completely different representation. My guess is that somebody
> > > > > earlier spent a lot of work on making that happen.
> > > > >
> > > > > The exceptions are:
> > > > >
> > > > > * exofs uses signed values, which can probably be changed to be
> > > > > consistent with the others.
> > > > > * isofs has a bug that limits it until 2027 on architectures with
> > > > > a signed 'char' type (otherwise it's 2155).
> > > > > * udf can represent times for many thousands of years through a
> > > > > 16-bit year representation, but the code to convert to epoch
> > > > > uses a const array that ends at 2038.
> > > > > * afs uses signed seconds and can probably be fixed
> > > > > * coda relies on user space time representation getting passed
> > > > > through an ioctl.
> > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > > > where they really use signed.
> > > > >
> > > > > I was confused about XFS since I didn't noticed that there are
> > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > > >
> > > > You've missed an awful lot more than just the implications for the
> > > > core kernel code.
> > > >
> > > > There's a good chance such changes propagate to APIs elsewhere in
> > > > the filesystems, because something you haven't realised is that XFS
> > > > effectively exposes the on-disk timestamp format directly to
> > > > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > > > by the online defragmenter.
> >
> > I really didn't look at them at all, as ioctl is very late on my
> > mental list of things to change. I do realize that a lot of drivers
> > and file systems do have ioctls that pass time values and we need to
> > address them one by one.
> >
> > I just looked at the ioctls you mentioned but don't see how open-by-handle
> > is affected by this. Can you point me to what you mean?
>
> Sorry, I misremembered how some of the XFS open-by-handle code works
> in userspace (XFS has a pretty rich open-by-handle ioctl() interface
> that predates the kernel syscalls by at least 10 years). Basically
> there is code in userspace that uses the information returned from
> bulkstat to construct file handles to pass to the open-by-handle
> ioctls. xfs_fsr then uses the combination of open-by-handle from the
> bulkstat output and the bulkstat output to feed into the swap extent
> ioctls....
>
> i.e. the filesystem's idea of what time is is passed to userspace as
> an opaque cookie in this case, but it is not used directly by the
> open-by-handle interfaces like I implied it was.
Ok, I see.
> > My patch set
> > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > more like 64-bit kernels regarding inode time stamps, which does
> > impact all the file systems that the a 64-bit time or the NFS
> > unsigned epoch (1970-2106), while your patch extends the file
> > system internal epoch (1901-2038 for XFS) so it can be used by
> > anything that knows how to handle larger than 32-bit second values
> > (either 64-bit kernel or 32-bit with inode_time patch).
>
> Right, but the issue is that 64 bit second counters are broken right
> now because most filesystems can't support more than 32 bit values.
> So it doesn't matter whether it's 32 bit or 64 bit machines, just
> adding explicit support for >32 bit second counters without doing
> anything else just extends that brokenness into the indefinite
> future.
Of course, "most filesystems" are obsolete, and most of the modern
file systems already support >32 bit timestamps: ext4, btrfs, cifs,
f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
64-bit systems, which interprets time stamps with the high bit
set as years 2038-2106 rather than 1903-1969.
> If we don't fix it now (i.e in the new user API and supporting
> infrastructure), then we'll *never be able to fix it* and we'll be
> stuck with timestamps that do really weird things when you pass
> arbitrary future dates to the kernel.
We already have that. I agree it's fixable and we should fix it,
but I don't see how this is different from what we had 20 years
ago when Linux on Alpha first introduced a 64-bit time_t. It's
been this way on every 64-bit Linux system since.
> > This is how ext4 does it (I mean
> > the sizeof() trick, not the bit stuffing they do):
> ....
> > I guess if there is general agreement on introducing 'struct inode_time',
> > we can skip that intermediate step.
>
> Also, I don't like the concept of having filesystems that will work
> on 64 bit but not 32 bit machines. Over the past 10 years, we've
> managed to remove most of those differences from the VFS and XFS,
> so adding new distinctions between 32/64 bit machines is not the
> direction I want to head in.
>
> As it is, I'm expecting to do this only after the struct inode_time
> and the superblock "time range" infrastructure have been added to
> the kernel and VFS. If that change is not made, then we've still
> only got 32 bit time....
Ok.
> > Do you have to manually change it in the
> > superblock? Since most of the time I'd suspect you wouldn't actually
> > use it for the foreseeable future, would it make sense to have a mount
> > option that allows it to be set, but doesn't actually change the
> > superblock until the first inode gets written with a nonzero epoch?
>
> Yes, we could set the flag on the first timestamp that goes beyond
> the current epoch, but that has two problems:
>
> 1. filesystem silently becomes incompatible with older
> kernels so failed upgrade rollbacks become problematic; and
>
> 2. It adds unecessary complexity, as this will end up being
> the default behaviour for all new filesystems within a year.
> Then we end up with a mount option and conversion functions
> that never get used but we have to support for years....
>
> > That way, you'd still be able to mount it with an older kernel but
> > also be forward compatible with time moving on.
>
> We've got plenty of time to roll this out so I don't see any need
> for putting in place temporary support mechanisms that unnecessarily
> complicate the code.
Ok, fair enough.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 11/32] xfs: convert to struct inode_time
2014-06-03 7:33 ` Arnd Bergmann@ 2014-06-03 8:41 ` Dave Chinner
2014-06-03 9:16 ` Arnd Bergmann0 siblings, 1 reply; 124+ messages in thread
From: Dave Chinner @ 2014-06-03 8:41 UTC (permalink / raw)
To: Arnd Bergmann
Cc: H. Peter Anvin, linux-kernel, linux-arch, joseph, john.stultz,
hch, tglx, geert, lftan, linux-fsdevel, xfs
On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote:
> On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > My patch set
> > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > > more like 64-bit kernels regarding inode time stamps, which does
> > > impact all the file systems that the a 64-bit time or the NFS
> > > unsigned epoch (1970-2106), while your patch extends the file
> > > system internal epoch (1901-2038 for XFS) so it can be used by
> > > anything that knows how to handle larger than 32-bit second values
> > > (either 64-bit kernel or 32-bit with inode_time patch).
> >
> > Right, but the issue is that 64 bit second counters are broken right
> > now because most filesystems can't support more than 32 bit values.
> > So it doesn't matter whether it's 32 bit or 64 bit machines, just
> > adding explicit support for >32 bit second counters without doing
> > anything else just extends that brokenness into the indefinite
> > future.
>
> Of course, "most filesystems" are obsolete, and most of the modern
> file systems already support >32 bit timestamps: ext4, btrfs, cifs,
> f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
> except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
> 64-bit systems, which interprets time stamps with the high bit
> set as years 2038-2106 rather than 1903-1969.
I'm not sure that's an entirely correct representation - the
remainder of the 32 bit-only timestamp filesystems don't actively
interpret the time stamp at all - it's just an opaque 32 bit value.
hence the interpretation of the value is dependent on whether the
kernel treats it as signed or unsigned....
> > infrastructure), then we'll *never be able to fix it* and we'll be
> > stuck with timestamps that do really weird things when you pass
> > arbitrary future dates to the kernel.
>
> We already have that. I agree it's fixable and we should fix it,
> but I don't see how this is different from what we had 20 years
> ago when Linux on Alpha first introduced a 64-bit time_t. It's
> been this way on every 64-bit Linux system since.
I see it differently: we've got 20 years more experience than when
the 64 bit time_t was introduced. That experience tells us that best
practices for API design are to range check every input to prevent
unintended side effects from occurring due to out-of-range data....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 21:57 ` H. Peter Anvin@ 2014-06-03 14:22 ` Arnd Bergmann
2014-06-03 14:33 ` Joseph S. Myers
2014-06-03 21:38 ` Dave Chinner0 siblings, 2 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-03 14:22 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Joseph S. Myers, linux-kernel, linux-arch, john.stultz, hch,
tglx, geert, lftan, linux-fsdevel, ceph-devel, cluster-devel,
coda, codalist, fuse-devel, linux-afs, linux-btrfs, linux-cifs,
linux-ext4, linux-f2fs-devel, linux-mtd, linux-nfs,
linux-ntfs-dev, linux-scsi, logfs, ocfs2-devel, reiserfs-devel,
samba-technical, xfs
On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> >>
> >> The bit that is really going to hurt is every single ioctl that uses a
> >> timespec.
> >>
> >> Honestly, though, I really don't understand the point with "struct
> >> inode_time". It seems like the zeroeth-order thing is to change the
> >> kernel internal version of struct timespec to have a 64-bit time... it
> >> isn't just about inodes. We then should be explicit about the external
> >> uses of time, and use accessors.
> >
> > I picked these because they are fairly isolated from all other uses,
> > in particular since inode times are the only things where we really
> > care about times in the distant past or future (decades away as opposed
> > to things that happened between boot and shutdown).
> >
>
> If nothing else, I would expect to be able to set the system time to
> weird values for testing. So I'm not so sure I agree with that...
I think John Stultz and Thomas Gleixner have already started looking
at how the timekeeping code can be updated. Once that is done, we should
be able to add a functional 64-bit gettimeofday/settimeofday syscall
pair. While I definitely agree this is one of the most basic things to
have, it's also not an area of the kernel that is easy to change.
> > For other kernel-internal uses, we may be better off migrating to
> > a completely different representation, such as nanoseconds since
> > boot or the architecture specific ktime_t, but this is really something
> > to decide for each subsystem.
>
> Having a bunch of different time representations in the kernel seems
> like a real headache...
We already have time_t, ktime_t, timeval, timespec, compat_timespec,
clock_t, cputime_t, cputime64_t, tm, nanoseconds, jiffies, jiffies64,
and lots of driver or file system specific representations. I'm all for
removing a bunch of these from the kernel, but my feeling is that this is
one of the cases where we first have to add new ones in order to remove
those that are already there.
To complicate things further, we also have various times bases
(realtime/utc, realtime/tai, monotonic, monotonic_raw, boottime, ...),
and at least for the timespec values we pass around, it's not always
obvious which one is used, of if that's the right one.
We probably don't want to add a lot of new representations, and it's
possible that we can change most of the internal code we have to
ktime_t and then convert that to whatever user space wants at the
interfaces.
The possible uses I can see for non-ktime_t types in the kernel are:
* inodes need 96 bit timestamps to represent the full range of values
that can be stored in a file system, you made a convincing argument
for that. Almost everything else can fit into 64 bit on a 32-bit
kernel, in theory also on a 64-bit kernel if we want that.
* A number of interfaces pass relative timespecs: nanosleep(), poll(),
select(), sigtimedwait(), alarm(), futex() and probably more. There is
nothing wrong with the use of timespec here, and it may be good to
annotate that by using a new type (e.g. struct timeout) that is defined
as compatible with the current timespec.
* For new user interfaces, we need a new type such as the
__kernel_timespec64 I introduced, so it doesn't clash with the normal
user timespec that may be smaller, depending on the libc.
* A lot of drivers will need new ioctl commands, and for drivers that
just need time stamps (audio, v4l, sockets, ...) it may be more
efficient and more correct to use a new timestamp_t (e.g. boot time
64-bit nanoseconds) than __kernel_timespec64, which is not normally
monotonic and requires a normalization step. If we end up introducing
such a type in the user interface, we can also start using it in the
kernel.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-03 14:22 ` Arnd Bergmann
2014-06-03 14:33 ` Joseph S. Myers@ 2014-06-03 21:38 ` Dave Chinner
2014-06-04 15:03 ` Arnd Bergmann1 sibling, 1 reply; 124+ messages in thread
From: Dave Chinner @ 2014-06-03 21:38 UTC (permalink / raw)
To: Arnd Bergmann
Cc: H. Peter Anvin, Joseph S. Myers, linux-kernel, linux-arch,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, ceph-devel,
cluster-devel, coda, codalist, fuse-devel, linux-afs,
linux-btrfs, linux-cifs, linux-ext4, linux-f2fs-devel, linux-mtd,
linux-nfs, linux-ntfs-dev, linux-scsi, logfs, ocfs2-devel,
reiserfs-devel, samba-technical, xfs
On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> The possible uses I can see for non-ktime_t types in the kernel are:
> * inodes need 96 bit timestamps to represent the full range of values
> that can be stored in a file system, you made a convincing argument
> for that. Almost everything else can fit into 64 bit on a 32-bit
> kernel, in theory also on a 64-bit kernel if we want that.
Just ot be pedantic, inodes don't *need* 96 bit timestamps - some
filesystems can *support up to* 96 bit timestamps. If the kernel
only supports 64 bit timestamps and that's all the kernel can
represent, then the upper bits of the 96 bit on-disk inode
timestamps simply remain zero.
If you move the filesystem between kernels with different time
ranges, then the filesystem needs to be able to tell the kernel what
it's supported range is. This is where having the VFS limit the
range of supported timestamps is important: the limit is the
min(kernel range, filesystem range). This allows the filesystems
to be indepenent of the kernel time representation, and the kernel
to be independent of the physical filesystem time encoding....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-03 21:38 ` Dave Chinner@ 2014-06-04 15:03 ` Arnd Bergmann
2014-06-04 17:30 ` Nicolas Pitre0 siblings, 1 reply; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-04 15:03 UTC (permalink / raw)
To: Dave Chinner
Cc: H. Peter Anvin, Joseph S. Myers, linux-kernel, linux-arch,
john.stultz, hch, tglx, geert, lftan, linux-fsdevel, ceph-devel,
cluster-devel, coda, codalist, fuse-devel, linux-afs,
linux-btrfs, linux-cifs, linux-ext4, linux-f2fs-devel, linux-mtd,
linux-nfs, linux-ntfs-dev, linux-scsi, logfs, ocfs2-devel,
reiserfs-devel, samba-technical, xfs
On Tuesday 03 June 2014, Dave Chinner wrote:
> On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> > The possible uses I can see for non-ktime_t types in the kernel are:
> > * inodes need 96 bit timestamps to represent the full range of values
> > that can be stored in a file system, you made a convincing argument
> > for that. Almost everything else can fit into 64 bit on a 32-bit
> > kernel, in theory also on a 64-bit kernel if we want that.
>
> Just ot be pedantic, inodes don't need 96 bit timestamps - some
> filesystems can *support up to* 96 bit timestamps. If the kernel
> only supports 64 bit timestamps and that's all the kernel can
> represent, then the upper bits of the 96 bit on-disk inode
> timestamps simply remain zero.
I meant the reverse: since we have file systems that can store
96-bit timestamps when using 64-bit kernels, we need to extend
32-bit kernels to have the same internal representation so we
can actually read those file systems correctly.
> If you move the filesystem between kernels with different time
> ranges, then the filesystem needs to be able to tell the kernel what
> it's supported range is. This is where having the VFS limit the
> range of supported timestamps is important: the limit is the
> min(kernel range, filesystem range). This allows the filesystems
> to be indepenent of the kernel time representation, and the kernel
> to be independent of the physical filesystem time encoding....
I agree it makes sense to let the kernel know about the limits
of the file system it accesses, but for the reverse, we're probably
better off just making the kernel representation large enough (i.e.
96 bits) so it can work with any known file system. We need another
check at the user space boundary to turn that into a value that the
user can understand, but that's another problem.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-02 21:02 ` Joseph S. Myers@ 2014-06-04 15:05 ` Arnd Bergmann0 siblings, 0 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-04 15:05 UTC (permalink / raw)
To: Joseph S. Myers
Cc: linux-kernel, linux-arch, john.stultz, hch, tglx, geert, lftan,
hpa, linux-fsdevel, ceph-devel, cluster-devel, coda, codalist,
fuse-devel, linux-afs, linux-btrfs, linux-cifs, linux-ext4,
linux-f2fs-devel, linux-mtd, linux-nfs, linux-ntfs-dev,
linux-scsi, logfs, ocfs2-devel, reiserfs-devel, samba-technical,
xfs
On Monday 02 June 2014, Joseph S. Myers wrote:
> On Mon, 2 Jun 2014, Arnd Bergmann wrote:
>
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
>
> For glibc I think it will make the most sense to add the support for
> 64-bit time_t across all architectures that currently have 32-bit time_t
> (with the new interfaces having fallback support to implementation in
> terms of the 32-bit kernel interfaces, if the 64-bit syscalls are
> unavailable either at runtime or in the kernel headers against which glibc
> is compiled - this fallback code will of course need to check for overflow
> when passing a time value to the kernel, hopefully with error handling
> consistent with whatever the kernel ends up doing when a filesystem can't
> support a timestamp). If some architectures don't provide the new
> interfaces in the kernel then that will mean the fallback code in glibc
> can't be removed until glibc support for those architectures is removed
> (as opposed to removing it when glibc no longer supports kernels predating
> the kernel support).
Ok, that's a good reason to just provide the new interfaces on all
architectures right away. Thanks for the insight!
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-04 15:03 ` Arnd Bergmann@ 2014-06-04 17:30 ` Nicolas Pitre
2014-06-04 19:24 ` Arnd Bergmann0 siblings, 1 reply; 124+ messages in thread
From: Nicolas Pitre @ 2014-06-04 17:30 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Dave Chinner, hch, linux-mtd, H. Peter Anvin, logfs, linux-afs,
Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, ceph-devel,
cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel,
reiserfs-devel, xfs, john.stultz, tglx, linux-nfs,
linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel,
ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On Wed, 4 Jun 2014, Arnd Bergmann wrote:
> On Tuesday 03 June 2014, Dave Chinner wrote:
> > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > filesystems can *support up to* 96 bit timestamps. If the kernel
> > only supports 64 bit timestamps and that's all the kernel can
> > represent, then the upper bits of the 96 bit on-disk inode
> > timestamps simply remain zero.
>
> I meant the reverse: since we have file systems that can store
> 96-bit timestamps when using 64-bit kernels, we need to extend
> 32-bit kernels to have the same internal representation so we
> can actually read those file systems correctly.
>
> > If you move the filesystem between kernels with different time
> > ranges, then the filesystem needs to be able to tell the kernel what
> > it's supported range is. This is where having the VFS limit the
> > range of supported timestamps is important: the limit is the
> > min(kernel range, filesystem range). This allows the filesystems
> > to be indepenent of the kernel time representation, and the kernel
> > to be independent of the physical filesystem time encoding....
>
> I agree it makes sense to let the kernel know about the limits
> of the file system it accesses, but for the reverse, we're probably
> better off just making the kernel representation large enough (i.e.
> 96 bits) so it can work with any known file system.
Depends... 96 bit handling may get prohibitive on 32-bit archs.
The important point here is for the kernel to be able to represent the
time _range_ used by any known filesystem, not necessarily the time
_precision_.
For example, a 64 bit representation can be made of 40 bits for seconds
spanning 34865 years, and 24 bits for fractional seconds providing
precision down to 60 nanosecs. That ought to be plenty good on 32 bit
systems while still being cheap to handle.
Nicolas
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-04 17:30 ` Nicolas Pitre@ 2014-06-04 19:24 ` Arnd Bergmann
2014-06-05 0:10 ` H. Peter Anvin0 siblings, 1 reply; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-04 19:24 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Dave Chinner, hch, linux-mtd, H. Peter Anvin, logfs, linux-afs,
Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, ceph-devel,
cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel,
reiserfs-devel, xfs, john.stultz, tglx, linux-nfs,
linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel,
ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On Wednesday 04 June 2014 13:30:32 Nicolas Pitre wrote:
> On Wed, 4 Jun 2014, Arnd Bergmann wrote:
>
> > On Tuesday 03 June 2014, Dave Chinner wrote:
> > > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > > filesystems can *support up to* 96 bit timestamps. If the kernel
> > > only supports 64 bit timestamps and that's all the kernel can
> > > represent, then the upper bits of the 96 bit on-disk inode
> > > timestamps simply remain zero.
> >
> > I meant the reverse: since we have file systems that can store
> > 96-bit timestamps when using 64-bit kernels, we need to extend
> > 32-bit kernels to have the same internal representation so we
> > can actually read those file systems correctly.
> >
> > > If you move the filesystem between kernels with different time
> > > ranges, then the filesystem needs to be able to tell the kernel what
> > > it's supported range is. This is where having the VFS limit the
> > > range of supported timestamps is important: the limit is the
> > > min(kernel range, filesystem range). This allows the filesystems
> > > to be indepenent of the kernel time representation, and the kernel
> > > to be independent of the physical filesystem time encoding....
> >
> > I agree it makes sense to let the kernel know about the limits
> > of the file system it accesses, but for the reverse, we're probably
> > better off just making the kernel representation large enough (i.e.
> > 96 bits) so it can work with any known file system.
>
> Depends... 96 bit handling may get prohibitive on 32-bit archs.
>
> The important point here is for the kernel to be able to represent the
> time _range_ used by any known filesystem, not necessarily the time
> _precision_.
>
> For example, a 64 bit representation can be made of 40 bits for seconds
> spanning 34865 years, and 24 bits for fractional seconds providing
> precision down to 60 nanosecs. That ought to be plenty good on 32 bit
> systems while still being cheap to handle.
I have checked earlier that we don't do any computation on inode
time stamps in common code, we just pass them around, so there is
very little runtime overhead. There is a small bit of space overhead
(12 byte) per inode, but that structure is already on the order of
500 bytes.
For other timekeeping stuff in the kernel, I agree that using some
64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
...) has advantages, that's exactly the point I was making earlier
against simply extending the internal time_t/timespec to 64-bit
seconds for everything.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread

*Re: [RFC 00/32] making inode time stamps y2038 ready
2014-06-05 0:10 ` H. Peter Anvin@ 2014-06-10 9:54 ` Arnd Bergmann0 siblings, 0 replies; 124+ messages in thread
From: Arnd Bergmann @ 2014-06-10 9:54 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Nicolas Pitre, Dave Chinner, hch, linux-mtd, logfs, linux-afs,
Joseph S. Myers, linux-arch, linux-cifs, linux-scsi, ceph-devel,
cluster-devel, coda, geert, linux-ext4, codalist, fuse-devel,
reiserfs-devel, xfs, john.stultz, tglx, linux-nfs,
linux-ntfs-dev, samba-technical, linux-kernel, linux-f2fs-devel,
ocfs2-devel, linux-fsdevel, lftan, linux-btrfs
On Wednesday 04 June 2014 17:10:24 H. Peter Anvin wrote:
> On 06/04/2014 12:24 PM, Arnd Bergmann wrote:
> >
> > For other timekeeping stuff in the kernel, I agree that using some
> > 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
> > ...) has advantages, that's exactly the point I was making earlier
> > against simply extending the internal time_t/timespec to 64-bit
> > seconds for everything.
> >
>
> How much of a performance issue is it to make time_t 64 bits, and for
> the bits there are, how hard are they to fix?
Probably very little overhead for most uses, it's more the regression
potential in the less common parts of the kernel I'm worried about.
There is a significant but not overwhelming number of uses of the
main problematic types in the kernel:
arnd@wuerfel:~/arm-soc$ git grep -wl time_t | wc
188 188 5566
arnd@wuerfel:~/arm-soc$ git grep -wl timeval | wc
320 320 10353
arnd@wuerfel:~/arm-soc$ git grep -wl timespec | wc
406 406 10886
I believe we have to audit all of them anyway if we want to change
the kernel to less problematic types and introduce new user
interfaces.
IMHO this work is helped if we change the uses to a new type
as we find the problems. This lets us do the work one subsystem
at a time and avoid accidental ABI changes. I don't care much what
type that will be, and having a 96-bit type will certainly work
well in a lot of cases, but I don't see a strong reason to use
that over other types, especially when they can be more efficient.
Arnd
^permalinkrawreply [flat|nested] 124+ messages in thread