[Discuss] FLIP-13 Side Outputs in Flink

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss] FLIP-13 Side Outputs in Flink

Chen Qin
Hey folks,

Please give feedback on FLIP-13!
https://cwiki.apache.org/confluence/display/FLINK/FLIP-13+Side+Outputs+in+Flink
JIRA task link to google doc
https://issues.apache.org/jira/browse/FLINK-4460

Thanks,
Chen Qin
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] FLIP-13 Side Outputs in Flink

Fabian Hueske-2
Hi Chen,

thanks for this interesting proposal. I think side output would be a very
valuable feature to have!

I went of the FLIP and have a few questions.

- Will multiple side outputs of the same type be supported?
- If I got it right, the FLIP proposes to change the signatures of many
user-defined functions (FlatMapFunction, WindowFunction, ...). Most of
these interfaces/classes are annotated with @Public, which means we cannot
change them in the Flink 1.x release line. What would be alternatives? I
can think of a) casting the Collector into a RichCollector (as you do in
your prototype) or b) retrieve the RichCollector from the RuntimeContext
that a RichFunction provides.

I'm not so familiar with the internals of the DataStream API, so I leave
comments on that to other.

Best, Fabian

2016-10-25 18:00 GMT+02:00 Chen Qin <[hidden email]>:

> Hey folks,
>
> Please give feedback on FLIP-13!
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 13+Side+Outputs+in+Flink
> JIRA task link to google doc
> https://issues.apache.org/jira/browse/FLINK-4460
>
> Thanks,
> Chen Qin
>
CPC
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] FLIP-13 Side Outputs in Flink

CPC
Is it just related to stream api? This feature could be really useful for
etl scenarios with dataset api as well.

On Oct 26, 2016 22:29, "Fabian Hueske" <[hidden email]> wrote:

> Hi Chen,
>
> thanks for this interesting proposal. I think side output would be a very
> valuable feature to have!
>
> I went of the FLIP and have a few questions.
>
> - Will multiple side outputs of the same type be supported?
> - If I got it right, the FLIP proposes to change the signatures of many
> user-defined functions (FlatMapFunction, WindowFunction, ...). Most of
> these interfaces/classes are annotated with @Public, which means we cannot
> change them in the Flink 1.x release line. What would be alternatives? I
> can think of a) casting the Collector into a RichCollector (as you do in
> your prototype) or b) retrieve the RichCollector from the RuntimeContext
> that a RichFunction provides.
>
> I'm not so familiar with the internals of the DataStream API, so I leave
> comments on that to other.
>
> Best, Fabian
>
> 2016-10-25 18:00 GMT+02:00 Chen Qin <[hidden email]>:
>
> > Hey folks,
> >
> > Please give feedback on FLIP-13!
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> > 13+Side+Outputs+in+Flink
> > JIRA task link to google doc
> > https://issues.apache.org/jira/browse/FLINK-4460
> >
> > Thanks,
> > Chen Qin
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] FLIP-13 Side Outputs in Flink

Fabian Hueske-2
Hi CPC,

I agree, support for side outputs would be nice for DataSet as well.
However, this is not easily possible because it would require an extensive
rewrite of the DataSet optimizer.
IMO, that's out of scope for this proposal.

Cheers, Fabian

2016-10-27 0:29 GMT+02:00 CPC <[hidden email]>:

> Is it just related to stream api? This feature could be really useful for
> etl scenarios with dataset api as well.
>
> On Oct 26, 2016 22:29, "Fabian Hueske" <[hidden email]> wrote:
>
> > Hi Chen,
> >
> > thanks for this interesting proposal. I think side output would be a very
> > valuable feature to have!
> >
> > I went of the FLIP and have a few questions.
> >
> > - Will multiple side outputs of the same type be supported?
> > - If I got it right, the FLIP proposes to change the signatures of many
> > user-defined functions (FlatMapFunction, WindowFunction, ...). Most of
> > these interfaces/classes are annotated with @Public, which means we
> cannot
> > change them in the Flink 1.x release line. What would be alternatives? I
> > can think of a) casting the Collector into a RichCollector (as you do in
> > your prototype) or b) retrieve the RichCollector from the RuntimeContext
> > that a RichFunction provides.
> >
> > I'm not so familiar with the internals of the DataStream API, so I leave
> > comments on that to other.
> >
> > Best, Fabian
> >
> > 2016-10-25 18:00 GMT+02:00 Chen Qin <[hidden email]>:
> >
> > > Hey folks,
> > >
> > > Please give feedback on FLIP-13!
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> > > 13+Side+Outputs+in+Flink
> > > JIRA task link to google doc
> > > https://issues.apache.org/jira/browse/FLINK-4460
> > >
> > > Thanks,
> > > Chen Qin
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] FLIP-13 Side Outputs in Flink

Chen Qin
In reply to this post by Chen Qin
 Hi Fabian

Thanks for your feedback. sorry for late reply.
Some of comments inline. Will update FLIP-13 wiki reflect your comments.


- Will multiple side outputs of the same type be supported?

> It wasn't implemented in prototype. But should be easy to support, we
have unique id in stream record.

- If I got it right, the FLIP proposes to change the signatures of many

user-defined functions (FlatMapFunction, WindowFunction, ...). Most of

these interfaces/classes are annotated with @Public, which means we cannot

change them in the Flink 1.x release line. What would be alternatives? I

can think of
a) casting the Collector into a RichCollector (as you do in

your prototype) or
> This is like a private magic API. Should be 100% compatible but not good
implementation.

b) retrieve the RichCollector from the RuntimeContext

> It seems better option, yet many highly used Function like FlatMap will
not get support. To get support, we need to create some redundant classes
inherited from RichFunction( like implement RichFlatMap etc) [we might put
these in different package and isolate impact of this change)

that a RichFunction provides.


I'm not so familiar with the internals of the DataStream API, so I leave

comments on that to other.


Best, Fabian

On Tue, Oct 25, 2016 at 9:00 AM, Chen Qin <[hidden email]> wrote:

> Hey folks,
>
> Please give feedback on FLIP-13!
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 13+Side+Outputs+in+Flink
> JIRA task link to google doc https://issues.apache.org/
> jira/browse/FLINK-4460
>
> Thanks,
> Chen Qin
>



--
-Chen Qin
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] FLIP-13 Side Outputs in Flink

Chen Qin
Adding another abstract method to Collector interface is also considerably
easier from API backward compatibility point of view.

The cost could be either

1) many class with empty implementation of *<S> void collect(OutputTag<S>
tag, S value) *method

2) split streamrecord related classes that implement Collector interface
from graph generator related classes. For streamrecord ones, we might be
able to implement *collect(T out)* by calling *<S> void
collect(OutputTag<S> tag, S value). *For graph generator keep it as it is.


On Wed, Nov 2, 2016 at 8:14 PM, Chen Qin <[hidden email]> wrote:

> Hi Fabian
>
> Thanks for your feedback. sorry for late reply.
> Some of comments inline. Will update FLIP-13 wiki reflect your comments.
>
>
> - Will multiple side outputs of the same type be supported?
>
> > It wasn't implemented in prototype. But should be easy to support, we
> have unique id in stream record.
>
> - If I got it right, the FLIP proposes to change the signatures of many
>
> user-defined functions (FlatMapFunction, WindowFunction, ...). Most of
>
> these interfaces/classes are annotated with @Public, which means we cannot
>
> change them in the Flink 1.x release line. What would be alternatives? I
>
> can think of
> a) casting the Collector into a RichCollector (as you do in
>
> your prototype) or
> > This is like a private magic API. Should be 100% compatible but not good
> implementation.
>
> b) retrieve the RichCollector from the RuntimeContext
>
> > It seems better option, yet many highly used Function like FlatMap will
> not get support. To get support, we need to create some redundant classes
> inherited from RichFunction( like implement RichFlatMap etc) [we might put
> these in different package and isolate impact of this change)
>
> that a RichFunction provides.
>
>
> I'm not so familiar with the internals of the DataStream API, so I leave
>
> comments on that to other.
>
>
> Best, Fabian
>
> On Tue, Oct 25, 2016 at 9:00 AM, Chen Qin <[hidden email]> wrote:
>
>> Hey folks,
>>
>> Please give feedback on FLIP-13!
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-13+
>> Side+Outputs+in+Flink
>> JIRA task link to google doc https://issues.apache.org/
>> jira/browse/FLINK-4460
>>
>> Thanks,
>> Chen Qin
>>
>
>
>
> --
> -Chen Qin
>



--
-Chen Qin
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] FLIP-13 Side Outputs in Flink

Chen Qin
Dear Flink community members,

Please review and comment on https://github.com/apache/flink/pull/2982.

Thanks,
Chen