[DISCUSS] FLIP-1 : Fine grained recovery from task failures

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-1 : Fine grained recovery from task failures

Stephan Ewen
Hi all!

Here is the very first FLIP (FLink Improvement Proposal): Fine grained
recovery from task failures

It describes a proposed enhancement for reducing the work done during
recovery.

https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo

Please comment in this mail thread, or in the GoogleDoc.

Best,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Chesnay Schepler-3
shouldn't the proposal be contained in the wiki instead of GoogleDocs?

On 12.07.2016 19:55, Stephan Ewen wrote:

> Hi all!
>
> Here is the very first FLIP (FLink Improvement Proposal): Fine grained
> recovery from task failures
>
> It describes a proposed enhancement for reducing the work done during
> recovery.
>
> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
>
> Please comment in this mail thread, or in the GoogleDoc.
>
> Best,
> Stephan
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Aljoscha Krettek-2
I added a FLIP document in the wiki:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures

For now, this contains the link to the Google Doc and a link to this
discussion thread. Once a Jira is created for this it should also be added
there.

On Tue, 12 Jul 2016 at 20:11 Chesnay Schepler <[hidden email]> wrote:

> shouldn't the proposal be contained in the wiki instead of GoogleDocs?
>
> On 12.07.2016 19:55, Stephan Ewen wrote:
> > Hi all!
> >
> > Here is the very first FLIP (FLink Improvement Proposal): Fine grained
> > recovery from task failures
> >
> > It describes a proposed enhancement for reducing the work done during
> > recovery.
> >
> >
> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
> >
> > Please comment in this mail thread, or in the GoogleDoc.
> >
> > Best,
> > Stephan
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Ufuk Celebi-2
Thanks for this very first proposal! Both the proposed functionality
and the way you explained it are super nice. :-)

I think that this has been long overdue in Flink. :-) Having worked on
both the ExecutionGraph and IntermediateResults before, I agree that
these are the relevant components for this change.

Version 1:

- Conceptually I agree that this is the way to go. I think it's
relatively straight forward to do this as you describe (minus all the
surprises during implementation ;-))
- Very nice explanation with the figures!
- Since FLIPs will probably also function as documentation, we might
link to the nice figures in [1] for people who are not familiar with
the details of the ExecutionGraph.

[1] https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html#jobmanager-data-structures

Version 2:

- I think that the changes to the intermediate results and pinning
will be straight forward.
- An important follow up for this (probably another FLIP?) will be how
we do memory management though. Right now the buffers for the
intermediate results come from the "network buffer pool", which is by
default very small (64MB). This is not a blocker for the
implementation of Version 2, but probably for a good user experience.
;-)

Public API changes:

- RestartStrategy: I would expect this to be interpreted as
maximum-total-task failures

– Ufuk


On Wed, Jul 13, 2016 at 8:20 AM, Aljoscha Krettek <[hidden email]> wrote:

> I added a FLIP document in the wiki:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
>
> For now, this contains the link to the Google Doc and a link to this
> discussion thread. Once a Jira is created for this it should also be added
> there.
>
> On Tue, 12 Jul 2016 at 20:11 Chesnay Schepler <[hidden email]> wrote:
>
>> shouldn't the proposal be contained in the wiki instead of GoogleDocs?
>>
>> On 12.07.2016 19:55, Stephan Ewen wrote:
>> > Hi all!
>> >
>> > Here is the very first FLIP (FLink Improvement Proposal): Fine grained
>> > recovery from task failures
>> >
>> > It describes a proposed enhancement for reducing the work done during
>> > recovery.
>> >
>> >
>> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
>> >
>> > Please comment in this mail thread, or in the GoogleDoc.
>> >
>> > Best,
>> > Stephan
>> >
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Stephan Ewen
In reply to this post by Chesnay Schepler-3
I deliberately kept it initially in the Google Doc because it is easier to
comment.

I would move it to the wiki once we have passed the initial discussion and
have converged (or are close to convergence) on the design.

On Tue, Jul 12, 2016 at 8:11 PM, Chesnay Schepler <[hidden email]>
wrote:

> shouldn't the proposal be contained in the wiki instead of GoogleDocs?
>
>
> On 12.07.2016 19:55, Stephan Ewen wrote:
>
>> Hi all!
>>
>> Here is the very first FLIP (FLink Improvement Proposal): Fine grained
>> recovery from task failures
>>
>> It describes a proposed enhancement for reducing the work done during
>> recovery.
>>
>>
>> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
>>
>> Please comment in this mail thread, or in the GoogleDoc.
>>
>> Best,
>> Stephan
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Robert Metzger
Thank you for writing the proposal!

I agree with Chesnay. I thought one of the main motivations of the new FLIP
process is to move the discussion onto ASF infra, to record the decision
making process on the mailing list and to make ongoing activities more
visible to the community.

If we want to use google docs because of the commenting features, we should
consider implementing a tool mirroring the comments onto the mailing lists.

If you want to change the FLIP process, I would suggest to move the
discussion about Google Docs to the FLIP 1 thread.



On Wed, Jul 13, 2016 at 2:07 PM, Stephan Ewen <[hidden email]> wrote:

> I deliberately kept it initially in the Google Doc because it is easier to
> comment.
>
> I would move it to the wiki once we have passed the initial discussion and
> have converged (or are close to convergence) on the design.
>
> On Tue, Jul 12, 2016 at 8:11 PM, Chesnay Schepler <[hidden email]>
> wrote:
>
> > shouldn't the proposal be contained in the wiki instead of GoogleDocs?
> >
> >
> > On 12.07.2016 19:55, Stephan Ewen wrote:
> >
> >> Hi all!
> >>
> >> Here is the very first FLIP (FLink Improvement Proposal): Fine grained
> >> recovery from task failures
> >>
> >> It describes a proposed enhancement for reducing the work done during
> >> recovery.
> >>
> >>
> >>
> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
> >>
> >> Please comment in this mail thread, or in the GoogleDoc.
> >>
> >> Best,
> >> Stephan
> >>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Aljoscha Krettek-2
+1
This was my original motivation. I saw a lot of discussion on the design
docs I initiated but none of this was reflected on the ML.

On Wed, 13 Jul 2016 at 17:11 Robert Metzger <[hidden email]> wrote:

> Thank you for writing the proposal!
>
> I agree with Chesnay. I thought one of the main motivations of the new FLIP
> process is to move the discussion onto ASF infra, to record the decision
> making process on the mailing list and to make ongoing activities more
> visible to the community.
>
> If we want to use google docs because of the commenting features, we should
> consider implementing a tool mirroring the comments onto the mailing lists.
>
> If you want to change the FLIP process, I would suggest to move the
> discussion about Google Docs to the FLIP 1 thread.
>
>
>
> On Wed, Jul 13, 2016 at 2:07 PM, Stephan Ewen <[hidden email]> wrote:
>
> > I deliberately kept it initially in the Google Doc because it is easier
> to
> > comment.
> >
> > I would move it to the wiki once we have passed the initial discussion
> and
> > have converged (or are close to convergence) on the design.
> >
> > On Tue, Jul 12, 2016 at 8:11 PM, Chesnay Schepler <[hidden email]>
> > wrote:
> >
> > > shouldn't the proposal be contained in the wiki instead of GoogleDocs?
> > >
> > >
> > > On 12.07.2016 19:55, Stephan Ewen wrote:
> > >
> > >> Hi all!
> > >>
> > >> Here is the very first FLIP (FLink Improvement Proposal): Fine grained
> > >> recovery from task failures
> > >>
> > >> It describes a proposed enhancement for reducing the work done during
> > >> recovery.
> > >>
> > >>
> > >>
> >
> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
> > >>
> > >> Please comment in this mail thread, or in the GoogleDoc.
> > >>
> > >> Best,
> > >> Stephan
> > >>
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Stephan Ewen
So far the reactions were positive towards the proposal.
If no one objects, I am going to assume lazy consensus and mark this FLIP
as adopted.

For the process, we'll post future FLIPs in the wiki directly, rather than
in a Google Doc first.

On Wed, Jul 13, 2016 at 5:17 PM, Aljoscha Krettek <[hidden email]>
wrote:

> +1
> This was my original motivation. I saw a lot of discussion on the design
> docs I initiated but none of this was reflected on the ML.
>
> On Wed, 13 Jul 2016 at 17:11 Robert Metzger <[hidden email]> wrote:
>
> > Thank you for writing the proposal!
> >
> > I agree with Chesnay. I thought one of the main motivations of the new
> FLIP
> > process is to move the discussion onto ASF infra, to record the decision
> > making process on the mailing list and to make ongoing activities more
> > visible to the community.
> >
> > If we want to use google docs because of the commenting features, we
> should
> > consider implementing a tool mirroring the comments onto the mailing
> lists.
> >
> > If you want to change the FLIP process, I would suggest to move the
> > discussion about Google Docs to the FLIP 1 thread.
> >
> >
> >
> > On Wed, Jul 13, 2016 at 2:07 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > > I deliberately kept it initially in the Google Doc because it is easier
> > to
> > > comment.
> > >
> > > I would move it to the wiki once we have passed the initial discussion
> > and
> > > have converged (or are close to convergence) on the design.
> > >
> > > On Tue, Jul 12, 2016 at 8:11 PM, Chesnay Schepler <[hidden email]>
> > > wrote:
> > >
> > > > shouldn't the proposal be contained in the wiki instead of
> GoogleDocs?
> > > >
> > > >
> > > > On 12.07.2016 19:55, Stephan Ewen wrote:
> > > >
> > > >> Hi all!
> > > >>
> > > >> Here is the very first FLIP (FLink Improvement Proposal): Fine
> grained
> > > >> recovery from task failures
> > > >>
> > > >> It describes a proposed enhancement for reducing the work done
> during
> > > >> recovery.
> > > >>
> > > >>
> > > >>
> > >
> >
> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
> > > >>
> > > >> Please comment in this mail thread, or in the GoogleDoc.
> > > >>
> > > >> Best,
> > > >> Stephan
> > > >>
> > > >>
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Paris Carbone
+1

I like the proposal!

Docs can work as supplementary medium to the wiki sometimes, in my view.
The wiki is definitely much preferred for tracking/archiving purposes but it might not hurt if we sometimes also reference a doc with a somewhat evolving prototype there.

> On 15 Jul 2016, at 15:37, Stephan Ewen <[hidden email]> wrote:
>
> So far the reactions were positive towards the proposal.
> If no one objects, I am going to assume lazy consensus and mark this FLIP
> as adopted.
>
> For the process, we'll post future FLIPs in the wiki directly, rather than
> in a Google Doc first.
>
> On Wed, Jul 13, 2016 at 5:17 PM, Aljoscha Krettek <[hidden email]>
> wrote:
>
>> +1
>> This was my original motivation. I saw a lot of discussion on the design
>> docs I initiated but none of this was reflected on the ML.
>>
>> On Wed, 13 Jul 2016 at 17:11 Robert Metzger <[hidden email]> wrote:
>>
>>> Thank you for writing the proposal!
>>>
>>> I agree with Chesnay. I thought one of the main motivations of the new
>> FLIP
>>> process is to move the discussion onto ASF infra, to record the decision
>>> making process on the mailing list and to make ongoing activities more
>>> visible to the community.
>>>
>>> If we want to use google docs because of the commenting features, we
>> should
>>> consider implementing a tool mirroring the comments onto the mailing
>> lists.
>>>
>>> If you want to change the FLIP process, I would suggest to move the
>>> discussion about Google Docs to the FLIP 1 thread.
>>>
>>>
>>>
>>> On Wed, Jul 13, 2016 at 2:07 PM, Stephan Ewen <[hidden email]> wrote:
>>>
>>>> I deliberately kept it initially in the Google Doc because it is easier
>>> to
>>>> comment.
>>>>
>>>> I would move it to the wiki once we have passed the initial discussion
>>> and
>>>> have converged (or are close to convergence) on the design.
>>>>
>>>> On Tue, Jul 12, 2016 at 8:11 PM, Chesnay Schepler <[hidden email]>
>>>> wrote:
>>>>
>>>>> shouldn't the proposal be contained in the wiki instead of
>> GoogleDocs?
>>>>>
>>>>>
>>>>> On 12.07.2016 19:55, Stephan Ewen wrote:
>>>>>
>>>>>> Hi all!
>>>>>>
>>>>>> Here is the very first FLIP (FLink Improvement Proposal): Fine
>> grained
>>>>>> recovery from task failures
>>>>>>
>>>>>> It describes a proposed enhancement for reducing the work done
>> during
>>>>>> recovery.
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
>>>>>>
>>>>>> Please comment in this mail thread, or in the GoogleDoc.
>>>>>>
>>>>>> Best,
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-1 : Fine grained recovery from task failures

Stephan Ewen
I migrated the doc to the wiki

On Fri, Jul 15, 2016 at 4:33 PM, Paris Carbone <[hidden email]> wrote:

> +1
>
> I like the proposal!
>
> Docs can work as supplementary medium to the wiki sometimes, in my view.
> The wiki is definitely much preferred for tracking/archiving purposes but
> it might not hurt if we sometimes also reference a doc with a somewhat
> evolving prototype there.
>
> > On 15 Jul 2016, at 15:37, Stephan Ewen <[hidden email]> wrote:
> >
> > So far the reactions were positive towards the proposal.
> > If no one objects, I am going to assume lazy consensus and mark this FLIP
> > as adopted.
> >
> > For the process, we'll post future FLIPs in the wiki directly, rather
> than
> > in a Google Doc first.
> >
> > On Wed, Jul 13, 2016 at 5:17 PM, Aljoscha Krettek <[hidden email]>
> > wrote:
> >
> >> +1
> >> This was my original motivation. I saw a lot of discussion on the design
> >> docs I initiated but none of this was reflected on the ML.
> >>
> >> On Wed, 13 Jul 2016 at 17:11 Robert Metzger <[hidden email]>
> wrote:
> >>
> >>> Thank you for writing the proposal!
> >>>
> >>> I agree with Chesnay. I thought one of the main motivations of the new
> >> FLIP
> >>> process is to move the discussion onto ASF infra, to record the
> decision
> >>> making process on the mailing list and to make ongoing activities more
> >>> visible to the community.
> >>>
> >>> If we want to use google docs because of the commenting features, we
> >> should
> >>> consider implementing a tool mirroring the comments onto the mailing
> >> lists.
> >>>
> >>> If you want to change the FLIP process, I would suggest to move the
> >>> discussion about Google Docs to the FLIP 1 thread.
> >>>
> >>>
> >>>
> >>> On Wed, Jul 13, 2016 at 2:07 PM, Stephan Ewen <[hidden email]>
> wrote:
> >>>
> >>>> I deliberately kept it initially in the Google Doc because it is
> easier
> >>> to
> >>>> comment.
> >>>>
> >>>> I would move it to the wiki once we have passed the initial discussion
> >>> and
> >>>> have converged (or are close to convergence) on the design.
> >>>>
> >>>> On Tue, Jul 12, 2016 at 8:11 PM, Chesnay Schepler <[hidden email]
> >
> >>>> wrote:
> >>>>
> >>>>> shouldn't the proposal be contained in the wiki instead of
> >> GoogleDocs?
> >>>>>
> >>>>>
> >>>>> On 12.07.2016 19:55, Stephan Ewen wrote:
> >>>>>
> >>>>>> Hi all!
> >>>>>>
> >>>>>> Here is the very first FLIP (FLink Improvement Proposal): Fine
> >> grained
> >>>>>> recovery from task failures
> >>>>>>
> >>>>>> It describes a proposed enhancement for reducing the work done
> >> during
> >>>>>> recovery.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/16S584XFzkfFu3MOfVCE0rHZ_JJgQrQuw9SXpanoMiMo
> >>>>>>
> >>>>>> Please comment in this mail thread, or in the GoogleDoc.
> >>>>>>
> >>>>>> Best,
> >>>>>> Stephan
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>