[ANNOUNCE] Contributing Alibaba's Blink

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[ANNOUNCE] Contributing Alibaba's Blink

Stephan Ewen
Dear Flink Community!

Some of you may have heard it already from announcements or from a Flink
Forward talk:
Alibaba has decided to open source its in-house improvements to Flink,
called Blink!
First of all, big thanks to team that developed these improvements and made
this
contribution possible!

Blink has some very exciting enhancements, most prominently on the Table
API/SQL side
and the unified execution of these programs. For batch (bounded) data, the
SQL execution
has full TPC-DS coverage (which is a big deal), and the execution is more
than 10x faster
than the current SQL runtime in Flink. Blink has also added support for
catalogs,
improved the failover speed of batch queries and the resource management.
It also
makes some good steps in the direction of more deeply unifying the batch
and streaming
execution.

The proposal is to merge Blink's enhancements into Flink, to give Flink's
SQL/Table API and
execution a big boost in usability and performance.

Just to avoid any confusion: This is not a suggested change of focus to
batch processing,
nor would this break with any of the streaming architecture and vision of
Flink.
This contribution follows very much the principle of "batch is a special
case of streaming".
As a special case, batch makes special optimizations possible. In its
current state,
Flink does not exploit many of these optimizations. This contribution adds
exactly these
optimizations and makes the streaming model of Flink applicable to harder
batch use cases.

Assuming that the community is excited about this as well, and in favor of
these enhancements
to Flink's capabilities, below are some thoughts on how this contribution
and integration
could work.

--- Making the code available ---

At the moment, the Blink code is in the form of a big Flink fork (rather
than isolated
patches on top of Flink), so the integration is unfortunately not as easy
as merging a
few patches or pull requests.

To support a non-disruptive merge of such a big contribution, I believe it
make sense to make
the code of the fork available in the Flink project first.
From there on, we can start to work on the details for merging the
enhancements, including
the refactoring of the necessary parts in the Flink master and the Blink
code to make a
merge possible without repeatedly breaking compatibility.

The first question is where do we put the code of the Blink fork during the
merging procedure?
My first thought was to temporarily add a repository (like
"flink-blink-staging"), but we could
also put it into a special branch in the main Flink repository.


I will start a separate thread about discussing a possible strategy to
handle and merge
such a big contribution.

Best,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Xiaowei Jiang
 Thanks Stephan! We are hoping to make the process as non-disruptive as possible to the Flink community. Making the Blink codebase public is the first step that hopefully facilitates further discussions.
Xiaowei

    On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <[hidden email]> wrote:  
 
 Dear Flink Community!

Some of you may have heard it already from announcements or from a Flink
Forward talk:
Alibaba has decided to open source its in-house improvements to Flink,
called Blink!
First of all, big thanks to team that developed these improvements and made
this
contribution possible!

Blink has some very exciting enhancements, most prominently on the Table
API/SQL side
and the unified execution of these programs. For batch (bounded) data, the
SQL execution
has full TPC-DS coverage (which is a big deal), and the execution is more
than 10x faster
than the current SQL runtime in Flink. Blink has also added support for
catalogs,
improved the failover speed of batch queries and the resource management.
It also
makes some good steps in the direction of more deeply unifying the batch
and streaming
execution.

The proposal is to merge Blink's enhancements into Flink, to give Flink's
SQL/Table API and
execution a big boost in usability and performance.

Just to avoid any confusion: This is not a suggested change of focus to
batch processing,
nor would this break with any of the streaming architecture and vision of
Flink.
This contribution follows very much the principle of "batch is a special
case of streaming".
As a special case, batch makes special optimizations possible. In its
current state,
Flink does not exploit many of these optimizations. This contribution adds
exactly these
optimizations and makes the streaming model of Flink applicable to harder
batch use cases.

Assuming that the community is excited about this as well, and in favor of
these enhancements
to Flink's capabilities, below are some thoughts on how this contribution
and integration
could work.

--- Making the code available ---

At the moment, the Blink code is in the form of a big Flink fork (rather
than isolated
patches on top of Flink), so the integration is unfortunately not as easy
as merging a
few patches or pull requests.

To support a non-disruptive merge of such a big contribution, I believe it
make sense to make
the code of the fork available in the Flink project first.
From there on, we can start to work on the details for merging the
enhancements, including
the refactoring of the necessary parts in the Flink master and the Blink
code to make a
merge possible without repeatedly breaking compatibility.

The first question is where do we put the code of the Blink fork during the
merging procedure?
My first thought was to temporarily add a repository (like
"flink-blink-staging"), but we could
also put it into a special branch in the main Flink repository.


I will start a separate thread about discussing a possible strategy to
handle and merge
such a big contribution.

Best,
Stephan
 
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Shaoxuan Wang
big +1 to contribute Blink codebase directly into the Apache Flink project.
Looking forward to the new journey.

Regards,
Shaoxuan

On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <[hidden email]> wrote:

>  Thanks Stephan! We are hoping to make the process as non-disruptive as
> possible to the Flink community. Making the Blink codebase public is the
> first step that hopefully facilitates further discussions.
> Xiaowei
>
>     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
> [hidden email]> wrote:
>
>  Dear Flink Community!
>
> Some of you may have heard it already from announcements or from a Flink
> Forward talk:
> Alibaba has decided to open source its in-house improvements to Flink,
> called Blink!
> First of all, big thanks to team that developed these improvements and made
> this
> contribution possible!
>
> Blink has some very exciting enhancements, most prominently on the Table
> API/SQL side
> and the unified execution of these programs. For batch (bounded) data, the
> SQL execution
> has full TPC-DS coverage (which is a big deal), and the execution is more
> than 10x faster
> than the current SQL runtime in Flink. Blink has also added support for
> catalogs,
> improved the failover speed of batch queries and the resource management.
> It also
> makes some good steps in the direction of more deeply unifying the batch
> and streaming
> execution.
>
> The proposal is to merge Blink's enhancements into Flink, to give Flink's
> SQL/Table API and
> execution a big boost in usability and performance.
>
> Just to avoid any confusion: This is not a suggested change of focus to
> batch processing,
> nor would this break with any of the streaming architecture and vision of
> Flink.
> This contribution follows very much the principle of "batch is a special
> case of streaming".
> As a special case, batch makes special optimizations possible. In its
> current state,
> Flink does not exploit many of these optimizations. This contribution adds
> exactly these
> optimizations and makes the streaming model of Flink applicable to harder
> batch use cases.
>
> Assuming that the community is excited about this as well, and in favor of
> these enhancements
> to Flink's capabilities, below are some thoughts on how this contribution
> and integration
> could work.
>
> --- Making the code available ---
>
> At the moment, the Blink code is in the form of a big Flink fork (rather
> than isolated
> patches on top of Flink), so the integration is unfortunately not as easy
> as merging a
> few patches or pull requests.
>
> To support a non-disruptive merge of such a big contribution, I believe it
> make sense to make
> the code of the fork available in the Flink project first.
> From there on, we can start to work on the details for merging the
> enhancements, including
> the refactoring of the necessary parts in the Flink master and the Blink
> code to make a
> merge possible without repeatedly breaking compatibility.
>
> The first question is where do we put the code of the Blink fork during the
> merging procedure?
> My first thought was to temporarily add a repository (like
> "flink-blink-staging"), but we could
> also put it into a special branch in the main Flink repository.
>
>
> I will start a separate thread about discussing a possible strategy to
> handle and merge
> such a big contribution.
>
> Best,
> Stephan
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Driesprong, Fokko
Great news Stephan!

Why not make the code available by having a fork of Flink on Alibaba's
Github account. This will allow us to do easy diff's in the Github UI and
create PR's of cherry-picked commits if needed. I can imagine that the
Blink codebase has a lot of branches by itself, so just pushing a couple of
branches to the main Flink repo is not ideal. Looking forward to it!

Cheers, Fokko





Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <[hidden email]>:

> big +1 to contribute Blink codebase directly into the Apache Flink project.
> Looking forward to the new journey.
>
> Regards,
> Shaoxuan
>
> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <[hidden email]> wrote:
>
> >  Thanks Stephan! We are hoping to make the process as non-disruptive as
> > possible to the Flink community. Making the Blink codebase public is the
> > first step that hopefully facilitates further discussions.
> > Xiaowei
> >
> >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
> > [hidden email]> wrote:
> >
> >  Dear Flink Community!
> >
> > Some of you may have heard it already from announcements or from a Flink
> > Forward talk:
> > Alibaba has decided to open source its in-house improvements to Flink,
> > called Blink!
> > First of all, big thanks to team that developed these improvements and
> made
> > this
> > contribution possible!
> >
> > Blink has some very exciting enhancements, most prominently on the Table
> > API/SQL side
> > and the unified execution of these programs. For batch (bounded) data,
> the
> > SQL execution
> > has full TPC-DS coverage (which is a big deal), and the execution is more
> > than 10x faster
> > than the current SQL runtime in Flink. Blink has also added support for
> > catalogs,
> > improved the failover speed of batch queries and the resource management.
> > It also
> > makes some good steps in the direction of more deeply unifying the batch
> > and streaming
> > execution.
> >
> > The proposal is to merge Blink's enhancements into Flink, to give Flink's
> > SQL/Table API and
> > execution a big boost in usability and performance.
> >
> > Just to avoid any confusion: This is not a suggested change of focus to
> > batch processing,
> > nor would this break with any of the streaming architecture and vision of
> > Flink.
> > This contribution follows very much the principle of "batch is a special
> > case of streaming".
> > As a special case, batch makes special optimizations possible. In its
> > current state,
> > Flink does not exploit many of these optimizations. This contribution
> adds
> > exactly these
> > optimizations and makes the streaming model of Flink applicable to harder
> > batch use cases.
> >
> > Assuming that the community is excited about this as well, and in favor
> of
> > these enhancements
> > to Flink's capabilities, below are some thoughts on how this contribution
> > and integration
> > could work.
> >
> > --- Making the code available ---
> >
> > At the moment, the Blink code is in the form of a big Flink fork (rather
> > than isolated
> > patches on top of Flink), so the integration is unfortunately not as easy
> > as merging a
> > few patches or pull requests.
> >
> > To support a non-disruptive merge of such a big contribution, I believe
> it
> > make sense to make
> > the code of the fork available in the Flink project first.
> > From there on, we can start to work on the details for merging the
> > enhancements, including
> > the refactoring of the necessary parts in the Flink master and the Blink
> > code to make a
> > merge possible without repeatedly breaking compatibility.
> >
> > The first question is where do we put the code of the Blink fork during
> the
> > merging procedure?
> > My first thought was to temporarily add a repository (like
> > "flink-blink-staging"), but we could
> > also put it into a special branch in the main Flink repository.
> >
> >
> > I will start a separate thread about discussing a possible strategy to
> > handle and merge
> > such a big contribution.
> >
> > Best,
> > Stephan
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Kurt Young
Hi Driesprong,

Glad to hear that you're interested with blink's codes. Actually, blink
only has one branch by itself, so either a separated repo or a flink's
branch works for blink's code share.

Best,
Kurt


On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko <[hidden email]>
wrote:

> Great news Stephan!
>
> Why not make the code available by having a fork of Flink on Alibaba's
> Github account. This will allow us to do easy diff's in the Github UI and
> create PR's of cherry-picked commits if needed. I can imagine that the
> Blink codebase has a lot of branches by itself, so just pushing a couple of
> branches to the main Flink repo is not ideal. Looking forward to it!
>
> Cheers, Fokko
>
>
>
>
>
> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <[hidden email]>:
>
> > big +1 to contribute Blink codebase directly into the Apache Flink
> project.
> > Looking forward to the new journey.
> >
> > Regards,
> > Shaoxuan
> >
> > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <[hidden email]>
> wrote:
> >
> > >  Thanks Stephan! We are hoping to make the process as non-disruptive as
> > > possible to the Flink community. Making the Blink codebase public is
> the
> > > first step that hopefully facilitates further discussions.
> > > Xiaowei
> > >
> > >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
> > > [hidden email]> wrote:
> > >
> > >  Dear Flink Community!
> > >
> > > Some of you may have heard it already from announcements or from a
> Flink
> > > Forward talk:
> > > Alibaba has decided to open source its in-house improvements to Flink,
> > > called Blink!
> > > First of all, big thanks to team that developed these improvements and
> > made
> > > this
> > > contribution possible!
> > >
> > > Blink has some very exciting enhancements, most prominently on the
> Table
> > > API/SQL side
> > > and the unified execution of these programs. For batch (bounded) data,
> > the
> > > SQL execution
> > > has full TPC-DS coverage (which is a big deal), and the execution is
> more
> > > than 10x faster
> > > than the current SQL runtime in Flink. Blink has also added support for
> > > catalogs,
> > > improved the failover speed of batch queries and the resource
> management.
> > > It also
> > > makes some good steps in the direction of more deeply unifying the
> batch
> > > and streaming
> > > execution.
> > >
> > > The proposal is to merge Blink's enhancements into Flink, to give
> Flink's
> > > SQL/Table API and
> > > execution a big boost in usability and performance.
> > >
> > > Just to avoid any confusion: This is not a suggested change of focus to
> > > batch processing,
> > > nor would this break with any of the streaming architecture and vision
> of
> > > Flink.
> > > This contribution follows very much the principle of "batch is a
> special
> > > case of streaming".
> > > As a special case, batch makes special optimizations possible. In its
> > > current state,
> > > Flink does not exploit many of these optimizations. This contribution
> > adds
> > > exactly these
> > > optimizations and makes the streaming model of Flink applicable to
> harder
> > > batch use cases.
> > >
> > > Assuming that the community is excited about this as well, and in favor
> > of
> > > these enhancements
> > > to Flink's capabilities, below are some thoughts on how this
> contribution
> > > and integration
> > > could work.
> > >
> > > --- Making the code available ---
> > >
> > > At the moment, the Blink code is in the form of a big Flink fork
> (rather
> > > than isolated
> > > patches on top of Flink), so the integration is unfortunately not as
> easy
> > > as merging a
> > > few patches or pull requests.
> > >
> > > To support a non-disruptive merge of such a big contribution, I believe
> > it
> > > make sense to make
> > > the code of the fork available in the Flink project first.
> > > From there on, we can start to work on the details for merging the
> > > enhancements, including
> > > the refactoring of the necessary parts in the Flink master and the
> Blink
> > > code to make a
> > > merge possible without repeatedly breaking compatibility.
> > >
> > > The first question is where do we put the code of the Blink fork during
> > the
> > > merging procedure?
> > > My first thought was to temporarily add a repository (like
> > > "flink-blink-staging"), but we could
> > > also put it into a special branch in the main Flink repository.
> > >
> > >
> > > I will start a separate thread about discussing a possible strategy to
> > > handle and merge
> > > such a big contribution.
> > >
> > > Best,
> > > Stephan
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Timo Walther-2
Thanks for driving these efforts, Stephan! Great news that the Blink
code base will be available for everyone soon. I already got access to
it and the added functionality and improved architecture is impressive.
There will be nice additions to Flink.

I guess the Blink code base will be continuously updated while the Flink
community merged chunks of it, right? If yes, I would also be in favor
of a separate repository similar to flink-shaded.

Regards,
Timo


Am 22.01.19 um 09:20 schrieb Kurt Young:

> Hi Driesprong,
>
> Glad to hear that you're interested with blink's codes. Actually, blink
> only has one branch by itself, so either a separated repo or a flink's
> branch works for blink's code share.
>
> Best,
> Kurt
>
>
> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko <[hidden email]>
> wrote:
>
>> Great news Stephan!
>>
>> Why not make the code available by having a fork of Flink on Alibaba's
>> Github account. This will allow us to do easy diff's in the Github UI and
>> create PR's of cherry-picked commits if needed. I can imagine that the
>> Blink codebase has a lot of branches by itself, so just pushing a couple of
>> branches to the main Flink repo is not ideal. Looking forward to it!
>>
>> Cheers, Fokko
>>
>>
>>
>>
>>
>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <[hidden email]>:
>>
>>> big +1 to contribute Blink codebase directly into the Apache Flink
>> project.
>>> Looking forward to the new journey.
>>>
>>> Regards,
>>> Shaoxuan
>>>
>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <[hidden email]>
>> wrote:
>>>>   Thanks Stephan! We are hoping to make the process as non-disruptive as
>>>> possible to the Flink community. Making the Blink codebase public is
>> the
>>>> first step that hopefully facilitates further discussions.
>>>> Xiaowei
>>>>
>>>>      On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
>>>> [hidden email]> wrote:
>>>>
>>>>   Dear Flink Community!
>>>>
>>>> Some of you may have heard it already from announcements or from a
>> Flink
>>>> Forward talk:
>>>> Alibaba has decided to open source its in-house improvements to Flink,
>>>> called Blink!
>>>> First of all, big thanks to team that developed these improvements and
>>> made
>>>> this
>>>> contribution possible!
>>>>
>>>> Blink has some very exciting enhancements, most prominently on the
>> Table
>>>> API/SQL side
>>>> and the unified execution of these programs. For batch (bounded) data,
>>> the
>>>> SQL execution
>>>> has full TPC-DS coverage (which is a big deal), and the execution is
>> more
>>>> than 10x faster
>>>> than the current SQL runtime in Flink. Blink has also added support for
>>>> catalogs,
>>>> improved the failover speed of batch queries and the resource
>> management.
>>>> It also
>>>> makes some good steps in the direction of more deeply unifying the
>> batch
>>>> and streaming
>>>> execution.
>>>>
>>>> The proposal is to merge Blink's enhancements into Flink, to give
>> Flink's
>>>> SQL/Table API and
>>>> execution a big boost in usability and performance.
>>>>
>>>> Just to avoid any confusion: This is not a suggested change of focus to
>>>> batch processing,
>>>> nor would this break with any of the streaming architecture and vision
>> of
>>>> Flink.
>>>> This contribution follows very much the principle of "batch is a
>> special
>>>> case of streaming".
>>>> As a special case, batch makes special optimizations possible. In its
>>>> current state,
>>>> Flink does not exploit many of these optimizations. This contribution
>>> adds
>>>> exactly these
>>>> optimizations and makes the streaming model of Flink applicable to
>> harder
>>>> batch use cases.
>>>>
>>>> Assuming that the community is excited about this as well, and in favor
>>> of
>>>> these enhancements
>>>> to Flink's capabilities, below are some thoughts on how this
>> contribution
>>>> and integration
>>>> could work.
>>>>
>>>> --- Making the code available ---
>>>>
>>>> At the moment, the Blink code is in the form of a big Flink fork
>> (rather
>>>> than isolated
>>>> patches on top of Flink), so the integration is unfortunately not as
>> easy
>>>> as merging a
>>>> few patches or pull requests.
>>>>
>>>> To support a non-disruptive merge of such a big contribution, I believe
>>> it
>>>> make sense to make
>>>> the code of the fork available in the Flink project first.
>>>>  From there on, we can start to work on the details for merging the
>>>> enhancements, including
>>>> the refactoring of the necessary parts in the Flink master and the
>> Blink
>>>> code to make a
>>>> merge possible without repeatedly breaking compatibility.
>>>>
>>>> The first question is where do we put the code of the Blink fork during
>>> the
>>>> merging procedure?
>>>> My first thought was to temporarily add a repository (like
>>>> "flink-blink-staging"), but we could
>>>> also put it into a special branch in the main Flink repository.
>>>>
>>>>
>>>> I will start a separate thread about discussing a possible strategy to
>>>> handle and merge
>>>> such a big contribution.
>>>>
>>>> Best,
>>>> Stephan
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

jincheng sun
Thanks Stephan!  This is a very exciting news for the flink community.

I recommend creating a branch for blink in the Flink repository. Just like
feature development, the blink branch is a branch with many enhancements,
and the enhanced functionality is continuously merged to the flink master.

Cheers,
Jincheng

Timo Walther <[hidden email]> 于2019年1月22日周二 下午4:45写道:

> Thanks for driving these efforts, Stephan! Great news that the Blink
> code base will be available for everyone soon. I already got access to
> it and the added functionality and improved architecture is impressive.
> There will be nice additions to Flink.
>
> I guess the Blink code base will be continuously updated while the Flink
> community merged chunks of it, right? If yes, I would also be in favor
> of a separate repository similar to flink-shaded.
>
> Regards,
> Timo
>
>
> Am 22.01.19 um 09:20 schrieb Kurt Young:
> > Hi Driesprong,
> >
> > Glad to hear that you're interested with blink's codes. Actually, blink
> > only has one branch by itself, so either a separated repo or a flink's
> > branch works for blink's code share.
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko <[hidden email]>
> > wrote:
> >
> >> Great news Stephan!
> >>
> >> Why not make the code available by having a fork of Flink on Alibaba's
> >> Github account. This will allow us to do easy diff's in the Github UI
> and
> >> create PR's of cherry-picked commits if needed. I can imagine that the
> >> Blink codebase has a lot of branches by itself, so just pushing a
> couple of
> >> branches to the main Flink repo is not ideal. Looking forward to it!
> >>
> >> Cheers, Fokko
> >>
> >>
> >>
> >>
> >>
> >> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <[hidden email]
> >:
> >>
> >>> big +1 to contribute Blink codebase directly into the Apache Flink
> >> project.
> >>> Looking forward to the new journey.
> >>>
> >>> Regards,
> >>> Shaoxuan
> >>>
> >>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <[hidden email]>
> >> wrote:
> >>>>   Thanks Stephan! We are hoping to make the process as non-disruptive
> as
> >>>> possible to the Flink community. Making the Blink codebase public is
> >> the
> >>>> first step that hopefully facilitates further discussions.
> >>>> Xiaowei
> >>>>
> >>>>      On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
> >>>> [hidden email]> wrote:
> >>>>
> >>>>   Dear Flink Community!
> >>>>
> >>>> Some of you may have heard it already from announcements or from a
> >> Flink
> >>>> Forward talk:
> >>>> Alibaba has decided to open source its in-house improvements to Flink,
> >>>> called Blink!
> >>>> First of all, big thanks to team that developed these improvements and
> >>> made
> >>>> this
> >>>> contribution possible!
> >>>>
> >>>> Blink has some very exciting enhancements, most prominently on the
> >> Table
> >>>> API/SQL side
> >>>> and the unified execution of these programs. For batch (bounded) data,
> >>> the
> >>>> SQL execution
> >>>> has full TPC-DS coverage (which is a big deal), and the execution is
> >> more
> >>>> than 10x faster
> >>>> than the current SQL runtime in Flink. Blink has also added support
> for
> >>>> catalogs,
> >>>> improved the failover speed of batch queries and the resource
> >> management.
> >>>> It also
> >>>> makes some good steps in the direction of more deeply unifying the
> >> batch
> >>>> and streaming
> >>>> execution.
> >>>>
> >>>> The proposal is to merge Blink's enhancements into Flink, to give
> >> Flink's
> >>>> SQL/Table API and
> >>>> execution a big boost in usability and performance.
> >>>>
> >>>> Just to avoid any confusion: This is not a suggested change of focus
> to
> >>>> batch processing,
> >>>> nor would this break with any of the streaming architecture and vision
> >> of
> >>>> Flink.
> >>>> This contribution follows very much the principle of "batch is a
> >> special
> >>>> case of streaming".
> >>>> As a special case, batch makes special optimizations possible. In its
> >>>> current state,
> >>>> Flink does not exploit many of these optimizations. This contribution
> >>> adds
> >>>> exactly these
> >>>> optimizations and makes the streaming model of Flink applicable to
> >> harder
> >>>> batch use cases.
> >>>>
> >>>> Assuming that the community is excited about this as well, and in
> favor
> >>> of
> >>>> these enhancements
> >>>> to Flink's capabilities, below are some thoughts on how this
> >> contribution
> >>>> and integration
> >>>> could work.
> >>>>
> >>>> --- Making the code available ---
> >>>>
> >>>> At the moment, the Blink code is in the form of a big Flink fork
> >> (rather
> >>>> than isolated
> >>>> patches on top of Flink), so the integration is unfortunately not as
> >> easy
> >>>> as merging a
> >>>> few patches or pull requests.
> >>>>
> >>>> To support a non-disruptive merge of such a big contribution, I
> believe
> >>> it
> >>>> make sense to make
> >>>> the code of the fork available in the Flink project first.
> >>>>  From there on, we can start to work on the details for merging the
> >>>> enhancements, including
> >>>> the refactoring of the necessary parts in the Flink master and the
> >> Blink
> >>>> code to make a
> >>>> merge possible without repeatedly breaking compatibility.
> >>>>
> >>>> The first question is where do we put the code of the Blink fork
> during
> >>> the
> >>>> merging procedure?
> >>>> My first thought was to temporarily add a repository (like
> >>>> "flink-blink-staging"), but we could
> >>>> also put it into a special branch in the main Flink repository.
> >>>>
> >>>>
> >>>> I will start a separate thread about discussing a possible strategy to
> >>>> handle and merge
> >>>> such a big contribution.
> >>>>
> >>>> Best,
> >>>> Stephan
> >>>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Ufuk Celebi-2
In reply to this post by Kurt Young
Hey Stephan and others,

thanks for the summary. I'm very excited about the outlined improvements. :-)

Separate branch vs. fork: I'm fine with either of the suggestions.
Depending on the expected strategy for merging the changes, expected
number of additional changes, etc., either one or the other approach
might be better suited.

– Ufuk

On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:

>
> Hi Driesprong,
>
> Glad to hear that you're interested with blink's codes. Actually, blink
> only has one branch by itself, so either a separated repo or a flink's
> branch works for blink's code share.
>
> Best,
> Kurt
>
>
> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko <[hidden email]>
> wrote:
>
> > Great news Stephan!
> >
> > Why not make the code available by having a fork of Flink on Alibaba's
> > Github account. This will allow us to do easy diff's in the Github UI and
> > create PR's of cherry-picked commits if needed. I can imagine that the
> > Blink codebase has a lot of branches by itself, so just pushing a couple of
> > branches to the main Flink repo is not ideal. Looking forward to it!
> >
> > Cheers, Fokko
> >
> >
> >
> >
> >
> > Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <[hidden email]>:
> >
> > > big +1 to contribute Blink codebase directly into the Apache Flink
> > project.
> > > Looking forward to the new journey.
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <[hidden email]>
> > wrote:
> > >
> > > >  Thanks Stephan! We are hoping to make the process as non-disruptive as
> > > > possible to the Flink community. Making the Blink codebase public is
> > the
> > > > first step that hopefully facilitates further discussions.
> > > > Xiaowei
> > > >
> > > >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
> > > > [hidden email]> wrote:
> > > >
> > > >  Dear Flink Community!
> > > >
> > > > Some of you may have heard it already from announcements or from a
> > Flink
> > > > Forward talk:
> > > > Alibaba has decided to open source its in-house improvements to Flink,
> > > > called Blink!
> > > > First of all, big thanks to team that developed these improvements and
> > > made
> > > > this
> > > > contribution possible!
> > > >
> > > > Blink has some very exciting enhancements, most prominently on the
> > Table
> > > > API/SQL side
> > > > and the unified execution of these programs. For batch (bounded) data,
> > > the
> > > > SQL execution
> > > > has full TPC-DS coverage (which is a big deal), and the execution is
> > more
> > > > than 10x faster
> > > > than the current SQL runtime in Flink. Blink has also added support for
> > > > catalogs,
> > > > improved the failover speed of batch queries and the resource
> > management.
> > > > It also
> > > > makes some good steps in the direction of more deeply unifying the
> > batch
> > > > and streaming
> > > > execution.
> > > >
> > > > The proposal is to merge Blink's enhancements into Flink, to give
> > Flink's
> > > > SQL/Table API and
> > > > execution a big boost in usability and performance.
> > > >
> > > > Just to avoid any confusion: This is not a suggested change of focus to
> > > > batch processing,
> > > > nor would this break with any of the streaming architecture and vision
> > of
> > > > Flink.
> > > > This contribution follows very much the principle of "batch is a
> > special
> > > > case of streaming".
> > > > As a special case, batch makes special optimizations possible. In its
> > > > current state,
> > > > Flink does not exploit many of these optimizations. This contribution
> > > adds
> > > > exactly these
> > > > optimizations and makes the streaming model of Flink applicable to
> > harder
> > > > batch use cases.
> > > >
> > > > Assuming that the community is excited about this as well, and in favor
> > > of
> > > > these enhancements
> > > > to Flink's capabilities, below are some thoughts on how this
> > contribution
> > > > and integration
> > > > could work.
> > > >
> > > > --- Making the code available ---
> > > >
> > > > At the moment, the Blink code is in the form of a big Flink fork
> > (rather
> > > > than isolated
> > > > patches on top of Flink), so the integration is unfortunately not as
> > easy
> > > > as merging a
> > > > few patches or pull requests.
> > > >
> > > > To support a non-disruptive merge of such a big contribution, I believe
> > > it
> > > > make sense to make
> > > > the code of the fork available in the Flink project first.
> > > > From there on, we can start to work on the details for merging the
> > > > enhancements, including
> > > > the refactoring of the necessary parts in the Flink master and the
> > Blink
> > > > code to make a
> > > > merge possible without repeatedly breaking compatibility.
> > > >
> > > > The first question is where do we put the code of the Blink fork during
> > > the
> > > > merging procedure?
> > > > My first thought was to temporarily add a repository (like
> > > > "flink-blink-staging"), but we could
> > > > also put it into a special branch in the main Flink repository.
> > > >
> > > >
> > > > I will start a separate thread about discussing a possible strategy to
> > > > handle and merge
> > > > such a big contribution.
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Dominik Wosiński
Hey!
I also think that creating the separate branch for Blink in Flink repo is a
better idea than creating the fork as IMHO it will allow merging changes
more easily.

Best Regards,
Dom.

wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):

> Hey Stephan and others,
>
> thanks for the summary. I'm very excited about the outlined improvements.
> :-)
>
> Separate branch vs. fork: I'm fine with either of the suggestions.
> Depending on the expected strategy for merging the changes, expected
> number of additional changes, etc., either one or the other approach
> might be better suited.
>
> – Ufuk
>
> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:
> >
> > Hi Driesprong,
> >
> > Glad to hear that you're interested with blink's codes. Actually, blink
> > only has one branch by itself, so either a separated repo or a flink's
> > branch works for blink's code share.
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko <[hidden email]>
> > wrote:
> >
> > > Great news Stephan!
> > >
> > > Why not make the code available by having a fork of Flink on Alibaba's
> > > Github account. This will allow us to do easy diff's in the Github UI
> and
> > > create PR's of cherry-picked commits if needed. I can imagine that the
> > > Blink codebase has a lot of branches by itself, so just pushing a
> couple of
> > > branches to the main Flink repo is not ideal. Looking forward to it!
> > >
> > > Cheers, Fokko
> > >
> > >
> > >
> > >
> > >
> > > Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <[hidden email]
> >:
> > >
> > > > big +1 to contribute Blink codebase directly into the Apache Flink
> > > project.
> > > > Looking forward to the new journey.
> > > >
> > > > Regards,
> > > > Shaoxuan
> > > >
> > > > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <[hidden email]>
> > > wrote:
> > > >
> > > > >  Thanks Stephan! We are hoping to make the process as
> non-disruptive as
> > > > > possible to the Flink community. Making the Blink codebase public
> is
> > > the
> > > > > first step that hopefully facilitates further discussions.
> > > > > Xiaowei
> > > > >
> > > > >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
> > > > > [hidden email]> wrote:
> > > > >
> > > > >  Dear Flink Community!
> > > > >
> > > > > Some of you may have heard it already from announcements or from a
> > > Flink
> > > > > Forward talk:
> > > > > Alibaba has decided to open source its in-house improvements to
> Flink,
> > > > > called Blink!
> > > > > First of all, big thanks to team that developed these improvements
> and
> > > > made
> > > > > this
> > > > > contribution possible!
> > > > >
> > > > > Blink has some very exciting enhancements, most prominently on the
> > > Table
> > > > > API/SQL side
> > > > > and the unified execution of these programs. For batch (bounded)
> data,
> > > > the
> > > > > SQL execution
> > > > > has full TPC-DS coverage (which is a big deal), and the execution
> is
> > > more
> > > > > than 10x faster
> > > > > than the current SQL runtime in Flink. Blink has also added
> support for
> > > > > catalogs,
> > > > > improved the failover speed of batch queries and the resource
> > > management.
> > > > > It also
> > > > > makes some good steps in the direction of more deeply unifying the
> > > batch
> > > > > and streaming
> > > > > execution.
> > > > >
> > > > > The proposal is to merge Blink's enhancements into Flink, to give
> > > Flink's
> > > > > SQL/Table API and
> > > > > execution a big boost in usability and performance.
> > > > >
> > > > > Just to avoid any confusion: This is not a suggested change of
> focus to
> > > > > batch processing,
> > > > > nor would this break with any of the streaming architecture and
> vision
> > > of
> > > > > Flink.
> > > > > This contribution follows very much the principle of "batch is a
> > > special
> > > > > case of streaming".
> > > > > As a special case, batch makes special optimizations possible. In
> its
> > > > > current state,
> > > > > Flink does not exploit many of these optimizations. This
> contribution
> > > > adds
> > > > > exactly these
> > > > > optimizations and makes the streaming model of Flink applicable to
> > > harder
> > > > > batch use cases.
> > > > >
> > > > > Assuming that the community is excited about this as well, and in
> favor
> > > > of
> > > > > these enhancements
> > > > > to Flink's capabilities, below are some thoughts on how this
> > > contribution
> > > > > and integration
> > > > > could work.
> > > > >
> > > > > --- Making the code available ---
> > > > >
> > > > > At the moment, the Blink code is in the form of a big Flink fork
> > > (rather
> > > > > than isolated
> > > > > patches on top of Flink), so the integration is unfortunately not
> as
> > > easy
> > > > > as merging a
> > > > > few patches or pull requests.
> > > > >
> > > > > To support a non-disruptive merge of such a big contribution, I
> believe
> > > > it
> > > > > make sense to make
> > > > > the code of the fork available in the Flink project first.
> > > > > From there on, we can start to work on the details for merging the
> > > > > enhancements, including
> > > > > the refactoring of the necessary parts in the Flink master and the
> > > Blink
> > > > > code to make a
> > > > > merge possible without repeatedly breaking compatibility.
> > > > >
> > > > > The first question is where do we put the code of the Blink fork
> during
> > > > the
> > > > > merging procedure?
> > > > > My first thought was to temporarily add a repository (like
> > > > > "flink-blink-staging"), but we could
> > > > > also put it into a special branch in the main Flink repository.
> > > > >
> > > > >
> > > > > I will start a separate thread about discussing a possible
> strategy to
> > > > > handle and merge
> > > > > such a big contribution.
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Jark
Great news! Looking forward to the new wave of developments.

If Blink needs to be continuously updated, fix bugs, release versions,
maybe a separate repository is a better idea.

Best,
Jark

On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]> wrote:

> Hey!
> I also think that creating the separate branch for Blink in Flink repo is a
> better idea than creating the fork as IMHO it will allow merging changes
> more easily.
>
> Best Regards,
> Dom.
>
> wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
>
> > Hey Stephan and others,
> >
> > thanks for the summary. I'm very excited about the outlined improvements.
> > :-)
> >
> > Separate branch vs. fork: I'm fine with either of the suggestions.
> > Depending on the expected strategy for merging the changes, expected
> > number of additional changes, etc., either one or the other approach
> > might be better suited.
> >
> > – Ufuk
> >
> > On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:
> > >
> > > Hi Driesprong,
> > >
> > > Glad to hear that you're interested with blink's codes. Actually, blink
> > > only has one branch by itself, so either a separated repo or a flink's
> > > branch works for blink's code share.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko <[hidden email]
> >
> > > wrote:
> > >
> > > > Great news Stephan!
> > > >
> > > > Why not make the code available by having a fork of Flink on
> Alibaba's
> > > > Github account. This will allow us to do easy diff's in the Github UI
> > and
> > > > create PR's of cherry-picked commits if needed. I can imagine that
> the
> > > > Blink codebase has a lot of branches by itself, so just pushing a
> > couple of
> > > > branches to the main Flink repo is not ideal. Looking forward to it!
> > > >
> > > > Cheers, Fokko
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
> [hidden email]
> > >:
> > > >
> > > > > big +1 to contribute Blink codebase directly into the Apache Flink
> > > > project.
> > > > > Looking forward to the new journey.
> > > > >
> > > > > Regards,
> > > > > Shaoxuan
> > > > >
> > > > > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <[hidden email]>
> > > > wrote:
> > > > >
> > > > > >  Thanks Stephan! We are hoping to make the process as
> > non-disruptive as
> > > > > > possible to the Flink community. Making the Blink codebase public
> > is
> > > > the
> > > > > > first step that hopefully facilitates further discussions.
> > > > > > Xiaowei
> > > > > >
> > > > > >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen <
> > > > > > [hidden email]> wrote:
> > > > > >
> > > > > >  Dear Flink Community!
> > > > > >
> > > > > > Some of you may have heard it already from announcements or from
> a
> > > > Flink
> > > > > > Forward talk:
> > > > > > Alibaba has decided to open source its in-house improvements to
> > Flink,
> > > > > > called Blink!
> > > > > > First of all, big thanks to team that developed these
> improvements
> > and
> > > > > made
> > > > > > this
> > > > > > contribution possible!
> > > > > >
> > > > > > Blink has some very exciting enhancements, most prominently on
> the
> > > > Table
> > > > > > API/SQL side
> > > > > > and the unified execution of these programs. For batch (bounded)
> > data,
> > > > > the
> > > > > > SQL execution
> > > > > > has full TPC-DS coverage (which is a big deal), and the execution
> > is
> > > > more
> > > > > > than 10x faster
> > > > > > than the current SQL runtime in Flink. Blink has also added
> > support for
> > > > > > catalogs,
> > > > > > improved the failover speed of batch queries and the resource
> > > > management.
> > > > > > It also
> > > > > > makes some good steps in the direction of more deeply unifying
> the
> > > > batch
> > > > > > and streaming
> > > > > > execution.
> > > > > >
> > > > > > The proposal is to merge Blink's enhancements into Flink, to give
> > > > Flink's
> > > > > > SQL/Table API and
> > > > > > execution a big boost in usability and performance.
> > > > > >
> > > > > > Just to avoid any confusion: This is not a suggested change of
> > focus to
> > > > > > batch processing,
> > > > > > nor would this break with any of the streaming architecture and
> > vision
> > > > of
> > > > > > Flink.
> > > > > > This contribution follows very much the principle of "batch is a
> > > > special
> > > > > > case of streaming".
> > > > > > As a special case, batch makes special optimizations possible. In
> > its
> > > > > > current state,
> > > > > > Flink does not exploit many of these optimizations. This
> > contribution
> > > > > adds
> > > > > > exactly these
> > > > > > optimizations and makes the streaming model of Flink applicable
> to
> > > > harder
> > > > > > batch use cases.
> > > > > >
> > > > > > Assuming that the community is excited about this as well, and in
> > favor
> > > > > of
> > > > > > these enhancements
> > > > > > to Flink's capabilities, below are some thoughts on how this
> > > > contribution
> > > > > > and integration
> > > > > > could work.
> > > > > >
> > > > > > --- Making the code available ---
> > > > > >
> > > > > > At the moment, the Blink code is in the form of a big Flink fork
> > > > (rather
> > > > > > than isolated
> > > > > > patches on top of Flink), so the integration is unfortunately not
> > as
> > > > easy
> > > > > > as merging a
> > > > > > few patches or pull requests.
> > > > > >
> > > > > > To support a non-disruptive merge of such a big contribution, I
> > believe
> > > > > it
> > > > > > make sense to make
> > > > > > the code of the fork available in the Flink project first.
> > > > > > From there on, we can start to work on the details for merging
> the
> > > > > > enhancements, including
> > > > > > the refactoring of the necessary parts in the Flink master and
> the
> > > > Blink
> > > > > > code to make a
> > > > > > merge possible without repeatedly breaking compatibility.
> > > > > >
> > > > > > The first question is where do we put the code of the Blink fork
> > during
> > > > > the
> > > > > > merging procedure?
> > > > > > My first thought was to temporarily add a repository (like
> > > > > > "flink-blink-staging"), but we could
> > > > > > also put it into a special branch in the main Flink repository.
> > > > > >
> > > > > >
> > > > > > I will start a separate thread about discussing a possible
> > strategy to
> > > > > > handle and merge
> > > > > > such a big contribution.
> > > > > >
> > > > > > Best,
> > > > > > Stephan
> > > > > >
> > > > >
> > > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Hequn Cheng
Hi all,

@Stephan  Thanks a lot for driving these efforts. I think a lot of people
is already waiting for this.
+1 for opening the blink source code.
Both a separate repository or a special branch is ok for me. Hopefully,
this will not last too long.

Best, Hequn


On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:

> Great news! Looking forward to the new wave of developments.
>
> If Blink needs to be continuously updated, fix bugs, release versions,
> maybe a separate repository is a better idea.
>
> Best,
> Jark
>
> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]> wrote:
>
> > Hey!
> > I also think that creating the separate branch for Blink in Flink repo
> is a
> > better idea than creating the fork as IMHO it will allow merging changes
> > more easily.
> >
> > Best Regards,
> > Dom.
> >
> > wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
> >
> > > Hey Stephan and others,
> > >
> > > thanks for the summary. I'm very excited about the outlined
> improvements.
> > > :-)
> > >
> > > Separate branch vs. fork: I'm fine with either of the suggestions.
> > > Depending on the expected strategy for merging the changes, expected
> > > number of additional changes, etc., either one or the other approach
> > > might be better suited.
> > >
> > > – Ufuk
> > >
> > > On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:
> > > >
> > > > Hi Driesprong,
> > > >
> > > > Glad to hear that you're interested with blink's codes. Actually,
> blink
> > > > only has one branch by itself, so either a separated repo or a
> flink's
> > > > branch works for blink's code share.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
> <[hidden email]
> > >
> > > > wrote:
> > > >
> > > > > Great news Stephan!
> > > > >
> > > > > Why not make the code available by having a fork of Flink on
> > Alibaba's
> > > > > Github account. This will allow us to do easy diff's in the Github
> UI
> > > and
> > > > > create PR's of cherry-picked commits if needed. I can imagine that
> > the
> > > > > Blink codebase has a lot of branches by itself, so just pushing a
> > > couple of
> > > > > branches to the main Flink repo is not ideal. Looking forward to
> it!
> > > > >
> > > > > Cheers, Fokko
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
> > [hidden email]
> > > >:
> > > > >
> > > > > > big +1 to contribute Blink codebase directly into the Apache
> Flink
> > > > > project.
> > > > > > Looking forward to the new journey.
> > > > > >
> > > > > > Regards,
> > > > > > Shaoxuan
> > > > > >
> > > > > > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
> [hidden email]>
> > > > > wrote:
> > > > > >
> > > > > > >  Thanks Stephan! We are hoping to make the process as
> > > non-disruptive as
> > > > > > > possible to the Flink community. Making the Blink codebase
> public
> > > is
> > > > > the
> > > > > > > first step that hopefully facilitates further discussions.
> > > > > > > Xiaowei
> > > > > > >
> > > > > > >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen
> <
> > > > > > > [hidden email]> wrote:
> > > > > > >
> > > > > > >  Dear Flink Community!
> > > > > > >
> > > > > > > Some of you may have heard it already from announcements or
> from
> > a
> > > > > Flink
> > > > > > > Forward talk:
> > > > > > > Alibaba has decided to open source its in-house improvements to
> > > Flink,
> > > > > > > called Blink!
> > > > > > > First of all, big thanks to team that developed these
> > improvements
> > > and
> > > > > > made
> > > > > > > this
> > > > > > > contribution possible!
> > > > > > >
> > > > > > > Blink has some very exciting enhancements, most prominently on
> > the
> > > > > Table
> > > > > > > API/SQL side
> > > > > > > and the unified execution of these programs. For batch
> (bounded)
> > > data,
> > > > > > the
> > > > > > > SQL execution
> > > > > > > has full TPC-DS coverage (which is a big deal), and the
> execution
> > > is
> > > > > more
> > > > > > > than 10x faster
> > > > > > > than the current SQL runtime in Flink. Blink has also added
> > > support for
> > > > > > > catalogs,
> > > > > > > improved the failover speed of batch queries and the resource
> > > > > management.
> > > > > > > It also
> > > > > > > makes some good steps in the direction of more deeply unifying
> > the
> > > > > batch
> > > > > > > and streaming
> > > > > > > execution.
> > > > > > >
> > > > > > > The proposal is to merge Blink's enhancements into Flink, to
> give
> > > > > Flink's
> > > > > > > SQL/Table API and
> > > > > > > execution a big boost in usability and performance.
> > > > > > >
> > > > > > > Just to avoid any confusion: This is not a suggested change of
> > > focus to
> > > > > > > batch processing,
> > > > > > > nor would this break with any of the streaming architecture and
> > > vision
> > > > > of
> > > > > > > Flink.
> > > > > > > This contribution follows very much the principle of "batch is
> a
> > > > > special
> > > > > > > case of streaming".
> > > > > > > As a special case, batch makes special optimizations possible.
> In
> > > its
> > > > > > > current state,
> > > > > > > Flink does not exploit many of these optimizations. This
> > > contribution
> > > > > > adds
> > > > > > > exactly these
> > > > > > > optimizations and makes the streaming model of Flink applicable
> > to
> > > > > harder
> > > > > > > batch use cases.
> > > > > > >
> > > > > > > Assuming that the community is excited about this as well, and
> in
> > > favor
> > > > > > of
> > > > > > > these enhancements
> > > > > > > to Flink's capabilities, below are some thoughts on how this
> > > > > contribution
> > > > > > > and integration
> > > > > > > could work.
> > > > > > >
> > > > > > > --- Making the code available ---
> > > > > > >
> > > > > > > At the moment, the Blink code is in the form of a big Flink
> fork
> > > > > (rather
> > > > > > > than isolated
> > > > > > > patches on top of Flink), so the integration is unfortunately
> not
> > > as
> > > > > easy
> > > > > > > as merging a
> > > > > > > few patches or pull requests.
> > > > > > >
> > > > > > > To support a non-disruptive merge of such a big contribution, I
> > > believe
> > > > > > it
> > > > > > > make sense to make
> > > > > > > the code of the fork available in the Flink project first.
> > > > > > > From there on, we can start to work on the details for merging
> > the
> > > > > > > enhancements, including
> > > > > > > the refactoring of the necessary parts in the Flink master and
> > the
> > > > > Blink
> > > > > > > code to make a
> > > > > > > merge possible without repeatedly breaking compatibility.
> > > > > > >
> > > > > > > The first question is where do we put the code of the Blink
> fork
> > > during
> > > > > > the
> > > > > > > merging procedure?
> > > > > > > My first thought was to temporarily add a repository (like
> > > > > > > "flink-blink-staging"), but we could
> > > > > > > also put it into a special branch in the main Flink repository.
> > > > > > >
> > > > > > >
> > > > > > > I will start a separate thread about discussing a possible
> > > strategy to
> > > > > > > handle and merge
> > > > > > > such a big contribution.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Zhijiang(wangzhijiang999)
Glad to see this announcement,  I already heard of many users asking the time of Blink open source in China recently, and thanks Stephan for making this happen.

All the enhancements or features in Blink would be eventually contirbuted/merged into Flink in more fine-grained way finally. Before Blink/Flink reach the same point, users might use Blink for enjoying some advantages earily.
So from the user experience aspect, Blink might have to push the changes such as bug fix before completely merging into Flink. This issue should be concerned when making the decision.

Best,
Zhijiang


------------------------------------------------------------------
From:Hequn Cheng <[hidden email]>
Send Time:2019年1月23日(星期三) 10:55
To:dev <[hidden email]>
Subject:Re: [ANNOUNCE] Contributing Alibaba's Blink

Hi all,

@Stephan  Thanks a lot for driving these efforts. I think a lot of people
is already waiting for this.
+1 for opening the blink source code.
Both a separate repository or a special branch is ok for me. Hopefully,
this will not last too long.

Best, Hequn


On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:

> Great news! Looking forward to the new wave of developments.
>
> If Blink needs to be continuously updated, fix bugs, release versions,
> maybe a separate repository is a better idea.
>
> Best,
> Jark
>
> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]> wrote:
>
> > Hey!
> > I also think that creating the separate branch for Blink in Flink repo
> is a
> > better idea than creating the fork as IMHO it will allow merging changes
> > more easily.
> >
> > Best Regards,
> > Dom.
> >
> > wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
> >
> > > Hey Stephan and others,
> > >
> > > thanks for the summary. I'm very excited about the outlined
> improvements.
> > > :-)
> > >
> > > Separate branch vs. fork: I'm fine with either of the suggestions.
> > > Depending on the expected strategy for merging the changes, expected
> > > number of additional changes, etc., either one or the other approach
> > > might be better suited.
> > >
> > > – Ufuk
> > >
> > > On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:
> > > >
> > > > Hi Driesprong,
> > > >
> > > > Glad to hear that you're interested with blink's codes. Actually,
> blink
> > > > only has one branch by itself, so either a separated repo or a
> flink's
> > > > branch works for blink's code share.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
> <[hidden email]
> > >
> > > > wrote:
> > > >
> > > > > Great news Stephan!
> > > > >
> > > > > Why not make the code available by having a fork of Flink on
> > Alibaba's
> > > > > Github account. This will allow us to do easy diff's in the Github
> UI
> > > and
> > > > > create PR's of cherry-picked commits if needed. I can imagine that
> > the
> > > > > Blink codebase has a lot of branches by itself, so just pushing a
> > > couple of
> > > > > branches to the main Flink repo is not ideal. Looking forward to
> it!
> > > > >
> > > > > Cheers, Fokko
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
> > [hidden email]
> > > >:
> > > > >
> > > > > > big +1 to contribute Blink codebase directly into the Apache
> Flink
> > > > > project.
> > > > > > Looking forward to the new journey.
> > > > > >
> > > > > > Regards,
> > > > > > Shaoxuan
> > > > > >
> > > > > > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
> [hidden email]>
> > > > > wrote:
> > > > > >
> > > > > > >  Thanks Stephan! We are hoping to make the process as
> > > non-disruptive as
> > > > > > > possible to the Flink community. Making the Blink codebase
> public
> > > is
> > > > > the
> > > > > > > first step that hopefully facilitates further discussions.
> > > > > > > Xiaowei
> > > > > > >
> > > > > > >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen
> <
> > > > > > > [hidden email]> wrote:
> > > > > > >
> > > > > > >  Dear Flink Community!
> > > > > > >
> > > > > > > Some of you may have heard it already from announcements or
> from
> > a
> > > > > Flink
> > > > > > > Forward talk:
> > > > > > > Alibaba has decided to open source its in-house improvements to
> > > Flink,
> > > > > > > called Blink!
> > > > > > > First of all, big thanks to team that developed these
> > improvements
> > > and
> > > > > > made
> > > > > > > this
> > > > > > > contribution possible!
> > > > > > >
> > > > > > > Blink has some very exciting enhancements, most prominently on
> > the
> > > > > Table
> > > > > > > API/SQL side
> > > > > > > and the unified execution of these programs. For batch
> (bounded)
> > > data,
> > > > > > the
> > > > > > > SQL execution
> > > > > > > has full TPC-DS coverage (which is a big deal), and the
> execution
> > > is
> > > > > more
> > > > > > > than 10x faster
> > > > > > > than the current SQL runtime in Flink. Blink has also added
> > > support for
> > > > > > > catalogs,
> > > > > > > improved the failover speed of batch queries and the resource
> > > > > management.
> > > > > > > It also
> > > > > > > makes some good steps in the direction of more deeply unifying
> > the
> > > > > batch
> > > > > > > and streaming
> > > > > > > execution.
> > > > > > >
> > > > > > > The proposal is to merge Blink's enhancements into Flink, to
> give
> > > > > Flink's
> > > > > > > SQL/Table API and
> > > > > > > execution a big boost in usability and performance.
> > > > > > >
> > > > > > > Just to avoid any confusion: This is not a suggested change of
> > > focus to
> > > > > > > batch processing,
> > > > > > > nor would this break with any of the streaming architecture and
> > > vision
> > > > > of
> > > > > > > Flink.
> > > > > > > This contribution follows very much the principle of "batch is
> a
> > > > > special
> > > > > > > case of streaming".
> > > > > > > As a special case, batch makes special optimizations possible.
> In
> > > its
> > > > > > > current state,
> > > > > > > Flink does not exploit many of these optimizations. This
> > > contribution
> > > > > > adds
> > > > > > > exactly these
> > > > > > > optimizations and makes the streaming model of Flink applicable
> > to
> > > > > harder
> > > > > > > batch use cases.
> > > > > > >
> > > > > > > Assuming that the community is excited about this as well, and
> in
> > > favor
> > > > > > of
> > > > > > > these enhancements
> > > > > > > to Flink's capabilities, below are some thoughts on how this
> > > > > contribution
> > > > > > > and integration
> > > > > > > could work.
> > > > > > >
> > > > > > > --- Making the code available ---
> > > > > > >
> > > > > > > At the moment, the Blink code is in the form of a big Flink
> fork
> > > > > (rather
> > > > > > > than isolated
> > > > > > > patches on top of Flink), so the integration is unfortunately
> not
> > > as
> > > > > easy
> > > > > > > as merging a
> > > > > > > few patches or pull requests.
> > > > > > >
> > > > > > > To support a non-disruptive merge of such a big contribution, I
> > > believe
> > > > > > it
> > > > > > > make sense to make
> > > > > > > the code of the fork available in the Flink project first.
> > > > > > > From there on, we can start to work on the details for merging
> > the
> > > > > > > enhancements, including
> > > > > > > the refactoring of the necessary parts in the Flink master and
> > the
> > > > > Blink
> > > > > > > code to make a
> > > > > > > merge possible without repeatedly breaking compatibility.
> > > > > > >
> > > > > > > The first question is where do we put the code of the Blink
> fork
> > > during
> > > > > > the
> > > > > > > merging procedure?
> > > > > > > My first thought was to temporarily add a repository (like
> > > > > > > "flink-blink-staging"), but we could
> > > > > > > also put it into a special branch in the main Flink repository.
> > > > > > >
> > > > > > >
> > > > > > > I will start a separate thread about discussing a possible
> > > strategy to
> > > > > > > handle and merge
> > > > > > > such a big contribution.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Kurt Young
In reply to this post by Hequn Cheng
Thanks @Stephan for this exciting announcement!

From my point of view, i would prefer to use branch. It makes the message
"Blink is pat of Flink" more straightforward and clear.

Except for the location of blink codes, there are some other questions like
what version should should use, and where do we put blink's documents.
Currently, we choose to use "1.5.1-blink-r0" as blink's version since blink
forked from Flink's 1.5.1. We also added some docs to blink just as Flink
did. Can blink use a website like
"https://ci.apache.org/projects/flink/flink-docs-release-1.7/" to put all
blink's docs, change it to something like
https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?

Best,
Kurt


On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng <[hidden email]> wrote:

> Hi all,
>
> @Stephan  Thanks a lot for driving these efforts. I think a lot of people
> is already waiting for this.
> +1 for opening the blink source code.
> Both a separate repository or a special branch is ok for me. Hopefully,
> this will not last too long.
>
> Best, Hequn
>
>
> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:
>
> > Great news! Looking forward to the new wave of developments.
> >
> > If Blink needs to be continuously updated, fix bugs, release versions,
> > maybe a separate repository is a better idea.
> >
> > Best,
> > Jark
> >
> > On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]> wrote:
> >
> > > Hey!
> > > I also think that creating the separate branch for Blink in Flink repo
> > is a
> > > better idea than creating the fork as IMHO it will allow merging
> changes
> > > more easily.
> > >
> > > Best Regards,
> > > Dom.
> > >
> > > wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
> > >
> > > > Hey Stephan and others,
> > > >
> > > > thanks for the summary. I'm very excited about the outlined
> > improvements.
> > > > :-)
> > > >
> > > > Separate branch vs. fork: I'm fine with either of the suggestions.
> > > > Depending on the expected strategy for merging the changes, expected
> > > > number of additional changes, etc., either one or the other approach
> > > > might be better suited.
> > > >
> > > > – Ufuk
> > > >
> > > > On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:
> > > > >
> > > > > Hi Driesprong,
> > > > >
> > > > > Glad to hear that you're interested with blink's codes. Actually,
> > blink
> > > > > only has one branch by itself, so either a separated repo or a
> > flink's
> > > > > branch works for blink's code share.
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
> > <[hidden email]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Great news Stephan!
> > > > > >
> > > > > > Why not make the code available by having a fork of Flink on
> > > Alibaba's
> > > > > > Github account. This will allow us to do easy diff's in the
> Github
> > UI
> > > > and
> > > > > > create PR's of cherry-picked commits if needed. I can imagine
> that
> > > the
> > > > > > Blink codebase has a lot of branches by itself, so just pushing a
> > > > couple of
> > > > > > branches to the main Flink repo is not ideal. Looking forward to
> > it!
> > > > > >
> > > > > > Cheers, Fokko
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
> > > [hidden email]
> > > > >:
> > > > > >
> > > > > > > big +1 to contribute Blink codebase directly into the Apache
> > Flink
> > > > > > project.
> > > > > > > Looking forward to the new journey.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Shaoxuan
> > > > > > >
> > > > > > > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
> > [hidden email]>
> > > > > > wrote:
> > > > > > >
> > > > > > > >  Thanks Stephan! We are hoping to make the process as
> > > > non-disruptive as
> > > > > > > > possible to the Flink community. Making the Blink codebase
> > public
> > > > is
> > > > > > the
> > > > > > > > first step that hopefully facilitates further discussions.
> > > > > > > > Xiaowei
> > > > > > > >
> > > > > > > >     On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
> Ewen
> > <
> > > > > > > > [hidden email]> wrote:
> > > > > > > >
> > > > > > > >  Dear Flink Community!
> > > > > > > >
> > > > > > > > Some of you may have heard it already from announcements or
> > from
> > > a
> > > > > > Flink
> > > > > > > > Forward talk:
> > > > > > > > Alibaba has decided to open source its in-house improvements
> to
> > > > Flink,
> > > > > > > > called Blink!
> > > > > > > > First of all, big thanks to team that developed these
> > > improvements
> > > > and
> > > > > > > made
> > > > > > > > this
> > > > > > > > contribution possible!
> > > > > > > >
> > > > > > > > Blink has some very exciting enhancements, most prominently
> on
> > > the
> > > > > > Table
> > > > > > > > API/SQL side
> > > > > > > > and the unified execution of these programs. For batch
> > (bounded)
> > > > data,
> > > > > > > the
> > > > > > > > SQL execution
> > > > > > > > has full TPC-DS coverage (which is a big deal), and the
> > execution
> > > > is
> > > > > > more
> > > > > > > > than 10x faster
> > > > > > > > than the current SQL runtime in Flink. Blink has also added
> > > > support for
> > > > > > > > catalogs,
> > > > > > > > improved the failover speed of batch queries and the resource
> > > > > > management.
> > > > > > > > It also
> > > > > > > > makes some good steps in the direction of more deeply
> unifying
> > > the
> > > > > > batch
> > > > > > > > and streaming
> > > > > > > > execution.
> > > > > > > >
> > > > > > > > The proposal is to merge Blink's enhancements into Flink, to
> > give
> > > > > > Flink's
> > > > > > > > SQL/Table API and
> > > > > > > > execution a big boost in usability and performance.
> > > > > > > >
> > > > > > > > Just to avoid any confusion: This is not a suggested change
> of
> > > > focus to
> > > > > > > > batch processing,
> > > > > > > > nor would this break with any of the streaming architecture
> and
> > > > vision
> > > > > > of
> > > > > > > > Flink.
> > > > > > > > This contribution follows very much the principle of "batch
> is
> > a
> > > > > > special
> > > > > > > > case of streaming".
> > > > > > > > As a special case, batch makes special optimizations
> possible.
> > In
> > > > its
> > > > > > > > current state,
> > > > > > > > Flink does not exploit many of these optimizations. This
> > > > contribution
> > > > > > > adds
> > > > > > > > exactly these
> > > > > > > > optimizations and makes the streaming model of Flink
> applicable
> > > to
> > > > > > harder
> > > > > > > > batch use cases.
> > > > > > > >
> > > > > > > > Assuming that the community is excited about this as well,
> and
> > in
> > > > favor
> > > > > > > of
> > > > > > > > these enhancements
> > > > > > > > to Flink's capabilities, below are some thoughts on how this
> > > > > > contribution
> > > > > > > > and integration
> > > > > > > > could work.
> > > > > > > >
> > > > > > > > --- Making the code available ---
> > > > > > > >
> > > > > > > > At the moment, the Blink code is in the form of a big Flink
> > fork
> > > > > > (rather
> > > > > > > > than isolated
> > > > > > > > patches on top of Flink), so the integration is unfortunately
> > not
> > > > as
> > > > > > easy
> > > > > > > > as merging a
> > > > > > > > few patches or pull requests.
> > > > > > > >
> > > > > > > > To support a non-disruptive merge of such a big
> contribution, I
> > > > believe
> > > > > > > it
> > > > > > > > make sense to make
> > > > > > > > the code of the fork available in the Flink project first.
> > > > > > > > From there on, we can start to work on the details for
> merging
> > > the
> > > > > > > > enhancements, including
> > > > > > > > the refactoring of the necessary parts in the Flink master
> and
> > > the
> > > > > > Blink
> > > > > > > > code to make a
> > > > > > > > merge possible without repeatedly breaking compatibility.
> > > > > > > >
> > > > > > > > The first question is where do we put the code of the Blink
> > fork
> > > > during
> > > > > > > the
> > > > > > > > merging procedure?
> > > > > > > > My first thought was to temporarily add a repository (like
> > > > > > > > "flink-blink-staging"), but we could
> > > > > > > > also put it into a special branch in the main Flink
> repository.
> > > > > > > >
> > > > > > > >
> > > > > > > > I will start a separate thread about discussing a possible
> > > > strategy to
> > > > > > > > handle and merge
> > > > > > > > such a big contribution.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Timo Walther-2
Hi Kurt,

I would not make the Blink's documentation visible to users or search
engines via a website. Otherwise this would communicate that Blink is an
official release. I would suggest to put the Blink docs into `/docs` and
people can build it with `./docs/build.sh -pi` if there are interested.
I would not invest time into setting up a docs infrastructure.

Regards,
Timo

Am 23.01.19 um 08:56 schrieb Kurt Young:

> Thanks @Stephan for this exciting announcement!
>
> >From my point of view, i would prefer to use branch. It makes the message
> "Blink is pat of Flink" more straightforward and clear.
>
> Except for the location of blink codes, there are some other questions like
> what version should should use, and where do we put blink's documents.
> Currently, we choose to use "1.5.1-blink-r0" as blink's version since blink
> forked from Flink's 1.5.1. We also added some docs to blink just as Flink
> did. Can blink use a website like
> "https://ci.apache.org/projects/flink/flink-docs-release-1.7/" to put all
> blink's docs, change it to something like
> https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
>
> Best,
> Kurt
>
>
> On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng <[hidden email]> wrote:
>
>> Hi all,
>>
>> @Stephan  Thanks a lot for driving these efforts. I think a lot of people
>> is already waiting for this.
>> +1 for opening the blink source code.
>> Both a separate repository or a special branch is ok for me. Hopefully,
>> this will not last too long.
>>
>> Best, Hequn
>>
>>
>> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:
>>
>>> Great news! Looking forward to the new wave of developments.
>>>
>>> If Blink needs to be continuously updated, fix bugs, release versions,
>>> maybe a separate repository is a better idea.
>>>
>>> Best,
>>> Jark
>>>
>>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]> wrote:
>>>
>>>> Hey!
>>>> I also think that creating the separate branch for Blink in Flink repo
>>> is a
>>>> better idea than creating the fork as IMHO it will allow merging
>> changes
>>>> more easily.
>>>>
>>>> Best Regards,
>>>> Dom.
>>>>
>>>> wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
>>>>
>>>>> Hey Stephan and others,
>>>>>
>>>>> thanks for the summary. I'm very excited about the outlined
>>> improvements.
>>>>> :-)
>>>>>
>>>>> Separate branch vs. fork: I'm fine with either of the suggestions.
>>>>> Depending on the expected strategy for merging the changes, expected
>>>>> number of additional changes, etc., either one or the other approach
>>>>> might be better suited.
>>>>>
>>>>> – Ufuk
>>>>>
>>>>> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:
>>>>>> Hi Driesprong,
>>>>>>
>>>>>> Glad to hear that you're interested with blink's codes. Actually,
>>> blink
>>>>>> only has one branch by itself, so either a separated repo or a
>>> flink's
>>>>>> branch works for blink's code share.
>>>>>>
>>>>>> Best,
>>>>>> Kurt
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
>>> <[hidden email]
>>>>>> wrote:
>>>>>>
>>>>>>> Great news Stephan!
>>>>>>>
>>>>>>> Why not make the code available by having a fork of Flink on
>>>> Alibaba's
>>>>>>> Github account. This will allow us to do easy diff's in the
>> Github
>>> UI
>>>>> and
>>>>>>> create PR's of cherry-picked commits if needed. I can imagine
>> that
>>>> the
>>>>>>> Blink codebase has a lot of branches by itself, so just pushing a
>>>>> couple of
>>>>>>> branches to the main Flink repo is not ideal. Looking forward to
>>> it!
>>>>>>> Cheers, Fokko
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
>>>> [hidden email]
>>>>>> :
>>>>>>>> big +1 to contribute Blink codebase directly into the Apache
>>> Flink
>>>>>>> project.
>>>>>>>> Looking forward to the new journey.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Shaoxuan
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>>>   Thanks Stephan! We are hoping to make the process as
>>>>> non-disruptive as
>>>>>>>>> possible to the Flink community. Making the Blink codebase
>>> public
>>>>> is
>>>>>>> the
>>>>>>>>> first step that hopefully facilitates further discussions.
>>>>>>>>> Xiaowei
>>>>>>>>>
>>>>>>>>>      On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
>> Ewen
>>> <
>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>   Dear Flink Community!
>>>>>>>>>
>>>>>>>>> Some of you may have heard it already from announcements or
>>> from
>>>> a
>>>>>>> Flink
>>>>>>>>> Forward talk:
>>>>>>>>> Alibaba has decided to open source its in-house improvements
>> to
>>>>> Flink,
>>>>>>>>> called Blink!
>>>>>>>>> First of all, big thanks to team that developed these
>>>> improvements
>>>>> and
>>>>>>>> made
>>>>>>>>> this
>>>>>>>>> contribution possible!
>>>>>>>>>
>>>>>>>>> Blink has some very exciting enhancements, most prominently
>> on
>>>> the
>>>>>>> Table
>>>>>>>>> API/SQL side
>>>>>>>>> and the unified execution of these programs. For batch
>>> (bounded)
>>>>> data,
>>>>>>>> the
>>>>>>>>> SQL execution
>>>>>>>>> has full TPC-DS coverage (which is a big deal), and the
>>> execution
>>>>> is
>>>>>>> more
>>>>>>>>> than 10x faster
>>>>>>>>> than the current SQL runtime in Flink. Blink has also added
>>>>> support for
>>>>>>>>> catalogs,
>>>>>>>>> improved the failover speed of batch queries and the resource
>>>>>>> management.
>>>>>>>>> It also
>>>>>>>>> makes some good steps in the direction of more deeply
>> unifying
>>>> the
>>>>>>> batch
>>>>>>>>> and streaming
>>>>>>>>> execution.
>>>>>>>>>
>>>>>>>>> The proposal is to merge Blink's enhancements into Flink, to
>>> give
>>>>>>> Flink's
>>>>>>>>> SQL/Table API and
>>>>>>>>> execution a big boost in usability and performance.
>>>>>>>>>
>>>>>>>>> Just to avoid any confusion: This is not a suggested change
>> of
>>>>> focus to
>>>>>>>>> batch processing,
>>>>>>>>> nor would this break with any of the streaming architecture
>> and
>>>>> vision
>>>>>>> of
>>>>>>>>> Flink.
>>>>>>>>> This contribution follows very much the principle of "batch
>> is
>>> a
>>>>>>> special
>>>>>>>>> case of streaming".
>>>>>>>>> As a special case, batch makes special optimizations
>> possible.
>>> In
>>>>> its
>>>>>>>>> current state,
>>>>>>>>> Flink does not exploit many of these optimizations. This
>>>>> contribution
>>>>>>>> adds
>>>>>>>>> exactly these
>>>>>>>>> optimizations and makes the streaming model of Flink
>> applicable
>>>> to
>>>>>>> harder
>>>>>>>>> batch use cases.
>>>>>>>>>
>>>>>>>>> Assuming that the community is excited about this as well,
>> and
>>> in
>>>>> favor
>>>>>>>> of
>>>>>>>>> these enhancements
>>>>>>>>> to Flink's capabilities, below are some thoughts on how this
>>>>>>> contribution
>>>>>>>>> and integration
>>>>>>>>> could work.
>>>>>>>>>
>>>>>>>>> --- Making the code available ---
>>>>>>>>>
>>>>>>>>> At the moment, the Blink code is in the form of a big Flink
>>> fork
>>>>>>> (rather
>>>>>>>>> than isolated
>>>>>>>>> patches on top of Flink), so the integration is unfortunately
>>> not
>>>>> as
>>>>>>> easy
>>>>>>>>> as merging a
>>>>>>>>> few patches or pull requests.
>>>>>>>>>
>>>>>>>>> To support a non-disruptive merge of such a big
>> contribution, I
>>>>> believe
>>>>>>>> it
>>>>>>>>> make sense to make
>>>>>>>>> the code of the fork available in the Flink project first.
>>>>>>>>>  From there on, we can start to work on the details for
>> merging
>>>> the
>>>>>>>>> enhancements, including
>>>>>>>>> the refactoring of the necessary parts in the Flink master
>> and
>>>> the
>>>>>>> Blink
>>>>>>>>> code to make a
>>>>>>>>> merge possible without repeatedly breaking compatibility.
>>>>>>>>>
>>>>>>>>> The first question is where do we put the code of the Blink
>>> fork
>>>>> during
>>>>>>>> the
>>>>>>>>> merging procedure?
>>>>>>>>> My first thought was to temporarily add a repository (like
>>>>>>>>> "flink-blink-staging"), but we could
>>>>>>>>> also put it into a special branch in the main Flink
>> repository.
>>>>>>>>>
>>>>>>>>> I will start a separate thread about discussing a possible
>>>>> strategy to
>>>>>>>>> handle and merge
>>>>>>>>> such a big contribution.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stephan
>>>>>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Kurt Young
Hi Timo,

What about the jar files, will blink's jar be uploaded to apache
repository? If not, i think it will be very inconvenient for users who
wants to try blink and view the documents if they need some help from doc.

Best,
Kurt


On Wed, Jan 23, 2019 at 6:09 PM Timo Walther <[hidden email]> wrote:

> Hi Kurt,
>
> I would not make the Blink's documentation visible to users or search
> engines via a website. Otherwise this would communicate that Blink is an
> official release. I would suggest to put the Blink docs into `/docs` and
> people can build it with `./docs/build.sh -pi` if there are interested.
> I would not invest time into setting up a docs infrastructure.
>
> Regards,
> Timo
>
> Am 23.01.19 um 08:56 schrieb Kurt Young:
> > Thanks @Stephan for this exciting announcement!
> >
> > >From my point of view, i would prefer to use branch. It makes the
> message
> > "Blink is pat of Flink" more straightforward and clear.
> >
> > Except for the location of blink codes, there are some other questions
> like
> > what version should should use, and where do we put blink's documents.
> > Currently, we choose to use "1.5.1-blink-r0" as blink's version since
> blink
> > forked from Flink's 1.5.1. We also added some docs to blink just as Flink
> > did. Can blink use a website like
> > "https://ci.apache.org/projects/flink/flink-docs-release-1.7/" to put
> all
> > blink's docs, change it to something like
> > https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng <[hidden email]>
> wrote:
> >
> >> Hi all,
> >>
> >> @Stephan  Thanks a lot for driving these efforts. I think a lot of
> people
> >> is already waiting for this.
> >> +1 for opening the blink source code.
> >> Both a separate repository or a special branch is ok for me. Hopefully,
> >> this will not last too long.
> >>
> >> Best, Hequn
> >>
> >>
> >> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:
> >>
> >>> Great news! Looking forward to the new wave of developments.
> >>>
> >>> If Blink needs to be continuously updated, fix bugs, release versions,
> >>> maybe a separate repository is a better idea.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]>
> wrote:
> >>>
> >>>> Hey!
> >>>> I also think that creating the separate branch for Blink in Flink repo
> >>> is a
> >>>> better idea than creating the fork as IMHO it will allow merging
> >> changes
> >>>> more easily.
> >>>>
> >>>> Best Regards,
> >>>> Dom.
> >>>>
> >>>> wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
> >>>>
> >>>>> Hey Stephan and others,
> >>>>>
> >>>>> thanks for the summary. I'm very excited about the outlined
> >>> improvements.
> >>>>> :-)
> >>>>>
> >>>>> Separate branch vs. fork: I'm fine with either of the suggestions.
> >>>>> Depending on the expected strategy for merging the changes, expected
> >>>>> number of additional changes, etc., either one or the other approach
> >>>>> might be better suited.
> >>>>>
> >>>>> – Ufuk
> >>>>>
> >>>>> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:
> >>>>>> Hi Driesprong,
> >>>>>>
> >>>>>> Glad to hear that you're interested with blink's codes. Actually,
> >>> blink
> >>>>>> only has one branch by itself, so either a separated repo or a
> >>> flink's
> >>>>>> branch works for blink's code share.
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
> >>> <[hidden email]
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Great news Stephan!
> >>>>>>>
> >>>>>>> Why not make the code available by having a fork of Flink on
> >>>> Alibaba's
> >>>>>>> Github account. This will allow us to do easy diff's in the
> >> Github
> >>> UI
> >>>>> and
> >>>>>>> create PR's of cherry-picked commits if needed. I can imagine
> >> that
> >>>> the
> >>>>>>> Blink codebase has a lot of branches by itself, so just pushing a
> >>>>> couple of
> >>>>>>> branches to the main Flink repo is not ideal. Looking forward to
> >>> it!
> >>>>>>> Cheers, Fokko
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
> >>>> [hidden email]
> >>>>>> :
> >>>>>>>> big +1 to contribute Blink codebase directly into the Apache
> >>> Flink
> >>>>>>> project.
> >>>>>>>> Looking forward to the new journey.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Shaoxuan
> >>>>>>>>
> >>>>>>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
> >>> [hidden email]>
> >>>>>>> wrote:
> >>>>>>>>>   Thanks Stephan! We are hoping to make the process as
> >>>>> non-disruptive as
> >>>>>>>>> possible to the Flink community. Making the Blink codebase
> >>> public
> >>>>> is
> >>>>>>> the
> >>>>>>>>> first step that hopefully facilitates further discussions.
> >>>>>>>>> Xiaowei
> >>>>>>>>>
> >>>>>>>>>      On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
> >> Ewen
> >>> <
> >>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>
> >>>>>>>>>   Dear Flink Community!
> >>>>>>>>>
> >>>>>>>>> Some of you may have heard it already from announcements or
> >>> from
> >>>> a
> >>>>>>> Flink
> >>>>>>>>> Forward talk:
> >>>>>>>>> Alibaba has decided to open source its in-house improvements
> >> to
> >>>>> Flink,
> >>>>>>>>> called Blink!
> >>>>>>>>> First of all, big thanks to team that developed these
> >>>> improvements
> >>>>> and
> >>>>>>>> made
> >>>>>>>>> this
> >>>>>>>>> contribution possible!
> >>>>>>>>>
> >>>>>>>>> Blink has some very exciting enhancements, most prominently
> >> on
> >>>> the
> >>>>>>> Table
> >>>>>>>>> API/SQL side
> >>>>>>>>> and the unified execution of these programs. For batch
> >>> (bounded)
> >>>>> data,
> >>>>>>>> the
> >>>>>>>>> SQL execution
> >>>>>>>>> has full TPC-DS coverage (which is a big deal), and the
> >>> execution
> >>>>> is
> >>>>>>> more
> >>>>>>>>> than 10x faster
> >>>>>>>>> than the current SQL runtime in Flink. Blink has also added
> >>>>> support for
> >>>>>>>>> catalogs,
> >>>>>>>>> improved the failover speed of batch queries and the resource
> >>>>>>> management.
> >>>>>>>>> It also
> >>>>>>>>> makes some good steps in the direction of more deeply
> >> unifying
> >>>> the
> >>>>>>> batch
> >>>>>>>>> and streaming
> >>>>>>>>> execution.
> >>>>>>>>>
> >>>>>>>>> The proposal is to merge Blink's enhancements into Flink, to
> >>> give
> >>>>>>> Flink's
> >>>>>>>>> SQL/Table API and
> >>>>>>>>> execution a big boost in usability and performance.
> >>>>>>>>>
> >>>>>>>>> Just to avoid any confusion: This is not a suggested change
> >> of
> >>>>> focus to
> >>>>>>>>> batch processing,
> >>>>>>>>> nor would this break with any of the streaming architecture
> >> and
> >>>>> vision
> >>>>>>> of
> >>>>>>>>> Flink.
> >>>>>>>>> This contribution follows very much the principle of "batch
> >> is
> >>> a
> >>>>>>> special
> >>>>>>>>> case of streaming".
> >>>>>>>>> As a special case, batch makes special optimizations
> >> possible.
> >>> In
> >>>>> its
> >>>>>>>>> current state,
> >>>>>>>>> Flink does not exploit many of these optimizations. This
> >>>>> contribution
> >>>>>>>> adds
> >>>>>>>>> exactly these
> >>>>>>>>> optimizations and makes the streaming model of Flink
> >> applicable
> >>>> to
> >>>>>>> harder
> >>>>>>>>> batch use cases.
> >>>>>>>>>
> >>>>>>>>> Assuming that the community is excited about this as well,
> >> and
> >>> in
> >>>>> favor
> >>>>>>>> of
> >>>>>>>>> these enhancements
> >>>>>>>>> to Flink's capabilities, below are some thoughts on how this
> >>>>>>> contribution
> >>>>>>>>> and integration
> >>>>>>>>> could work.
> >>>>>>>>>
> >>>>>>>>> --- Making the code available ---
> >>>>>>>>>
> >>>>>>>>> At the moment, the Blink code is in the form of a big Flink
> >>> fork
> >>>>>>> (rather
> >>>>>>>>> than isolated
> >>>>>>>>> patches on top of Flink), so the integration is unfortunately
> >>> not
> >>>>> as
> >>>>>>> easy
> >>>>>>>>> as merging a
> >>>>>>>>> few patches or pull requests.
> >>>>>>>>>
> >>>>>>>>> To support a non-disruptive merge of such a big
> >> contribution, I
> >>>>> believe
> >>>>>>>> it
> >>>>>>>>> make sense to make
> >>>>>>>>> the code of the fork available in the Flink project first.
> >>>>>>>>>  From there on, we can start to work on the details for
> >> merging
> >>>> the
> >>>>>>>>> enhancements, including
> >>>>>>>>> the refactoring of the necessary parts in the Flink master
> >> and
> >>>> the
> >>>>>>> Blink
> >>>>>>>>> code to make a
> >>>>>>>>> merge possible without repeatedly breaking compatibility.
> >>>>>>>>>
> >>>>>>>>> The first question is where do we put the code of the Blink
> >>> fork
> >>>>> during
> >>>>>>>> the
> >>>>>>>>> merging procedure?
> >>>>>>>>> My first thought was to temporarily add a repository (like
> >>>>>>>>> "flink-blink-staging"), but we could
> >>>>>>>>> also put it into a special branch in the main Flink
> >> repository.
> >>>>>>>>>
> >>>>>>>>> I will start a separate thread about discussing a possible
> >>>>> strategy to
> >>>>>>>>> handle and merge
> >>>>>>>>> such a big contribution.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Stephan
> >>>>>>>>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Timo Walther-2
As far as I know it, we will not provide any binaries but only the
source code. JAR files on Apache servers would need an official
voting/release process. Interested users can build Blink themselves
using `mvn clean package`.

@Stephan: Please correct me if I'm wrong.

Regards,
Timo

Am 23.01.19 um 11:16 schrieb Kurt Young:

> Hi Timo,
>
> What about the jar files, will blink's jar be uploaded to apache
> repository? If not, i think it will be very inconvenient for users who
> wants to try blink and view the documents if they need some help from doc.
>
> Best,
> Kurt
>
>
> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther <[hidden email]> wrote:
>
>> Hi Kurt,
>>
>> I would not make the Blink's documentation visible to users or search
>> engines via a website. Otherwise this would communicate that Blink is an
>> official release. I would suggest to put the Blink docs into `/docs` and
>> people can build it with `./docs/build.sh -pi` if there are interested.
>> I would not invest time into setting up a docs infrastructure.
>>
>> Regards,
>> Timo
>>
>> Am 23.01.19 um 08:56 schrieb Kurt Young:
>>> Thanks @Stephan for this exciting announcement!
>>>
>>> >From my point of view, i would prefer to use branch. It makes the
>> message
>>> "Blink is pat of Flink" more straightforward and clear.
>>>
>>> Except for the location of blink codes, there are some other questions
>> like
>>> what version should should use, and where do we put blink's documents.
>>> Currently, we choose to use "1.5.1-blink-r0" as blink's version since
>> blink
>>> forked from Flink's 1.5.1. We also added some docs to blink just as Flink
>>> did. Can blink use a website like
>>> "https://ci.apache.org/projects/flink/flink-docs-release-1.7/" to put
>> all
>>> blink's docs, change it to something like
>>> https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng <[hidden email]>
>> wrote:
>>>> Hi all,
>>>>
>>>> @Stephan  Thanks a lot for driving these efforts. I think a lot of
>> people
>>>> is already waiting for this.
>>>> +1 for opening the blink source code.
>>>> Both a separate repository or a special branch is ok for me. Hopefully,
>>>> this will not last too long.
>>>>
>>>> Best, Hequn
>>>>
>>>>
>>>> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:
>>>>
>>>>> Great news! Looking forward to the new wave of developments.
>>>>>
>>>>> If Blink needs to be continuously updated, fix bugs, release versions,
>>>>> maybe a separate repository is a better idea.
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]>
>> wrote:
>>>>>> Hey!
>>>>>> I also think that creating the separate branch for Blink in Flink repo
>>>>> is a
>>>>>> better idea than creating the fork as IMHO it will allow merging
>>>> changes
>>>>>> more easily.
>>>>>>
>>>>>> Best Regards,
>>>>>> Dom.
>>>>>>
>>>>>> wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
>>>>>>
>>>>>>> Hey Stephan and others,
>>>>>>>
>>>>>>> thanks for the summary. I'm very excited about the outlined
>>>>> improvements.
>>>>>>> :-)
>>>>>>>
>>>>>>> Separate branch vs. fork: I'm fine with either of the suggestions.
>>>>>>> Depending on the expected strategy for merging the changes, expected
>>>>>>> number of additional changes, etc., either one or the other approach
>>>>>>> might be better suited.
>>>>>>>
>>>>>>> – Ufuk
>>>>>>>
>>>>>>> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]> wrote:
>>>>>>>> Hi Driesprong,
>>>>>>>>
>>>>>>>> Glad to hear that you're interested with blink's codes. Actually,
>>>>> blink
>>>>>>>> only has one branch by itself, so either a separated repo or a
>>>>> flink's
>>>>>>>> branch works for blink's code share.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Kurt
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
>>>>> <[hidden email]
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Great news Stephan!
>>>>>>>>>
>>>>>>>>> Why not make the code available by having a fork of Flink on
>>>>>> Alibaba's
>>>>>>>>> Github account. This will allow us to do easy diff's in the
>>>> Github
>>>>> UI
>>>>>>> and
>>>>>>>>> create PR's of cherry-picked commits if needed. I can imagine
>>>> that
>>>>>> the
>>>>>>>>> Blink codebase has a lot of branches by itself, so just pushing a
>>>>>>> couple of
>>>>>>>>> branches to the main Flink repo is not ideal. Looking forward to
>>>>> it!
>>>>>>>>> Cheers, Fokko
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
>>>>>> [hidden email]
>>>>>>>> :
>>>>>>>>>> big +1 to contribute Blink codebase directly into the Apache
>>>>> Flink
>>>>>>>>> project.
>>>>>>>>>> Looking forward to the new journey.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Shaoxuan
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
>>>>> [hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>>    Thanks Stephan! We are hoping to make the process as
>>>>>>> non-disruptive as
>>>>>>>>>>> possible to the Flink community. Making the Blink codebase
>>>>> public
>>>>>>> is
>>>>>>>>> the
>>>>>>>>>>> first step that hopefully facilitates further discussions.
>>>>>>>>>>> Xiaowei
>>>>>>>>>>>
>>>>>>>>>>>       On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
>>>> Ewen
>>>>> <
>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>    Dear Flink Community!
>>>>>>>>>>>
>>>>>>>>>>> Some of you may have heard it already from announcements or
>>>>> from
>>>>>> a
>>>>>>>>> Flink
>>>>>>>>>>> Forward talk:
>>>>>>>>>>> Alibaba has decided to open source its in-house improvements
>>>> to
>>>>>>> Flink,
>>>>>>>>>>> called Blink!
>>>>>>>>>>> First of all, big thanks to team that developed these
>>>>>> improvements
>>>>>>> and
>>>>>>>>>> made
>>>>>>>>>>> this
>>>>>>>>>>> contribution possible!
>>>>>>>>>>>
>>>>>>>>>>> Blink has some very exciting enhancements, most prominently
>>>> on
>>>>>> the
>>>>>>>>> Table
>>>>>>>>>>> API/SQL side
>>>>>>>>>>> and the unified execution of these programs. For batch
>>>>> (bounded)
>>>>>>> data,
>>>>>>>>>> the
>>>>>>>>>>> SQL execution
>>>>>>>>>>> has full TPC-DS coverage (which is a big deal), and the
>>>>> execution
>>>>>>> is
>>>>>>>>> more
>>>>>>>>>>> than 10x faster
>>>>>>>>>>> than the current SQL runtime in Flink. Blink has also added
>>>>>>> support for
>>>>>>>>>>> catalogs,
>>>>>>>>>>> improved the failover speed of batch queries and the resource
>>>>>>>>> management.
>>>>>>>>>>> It also
>>>>>>>>>>> makes some good steps in the direction of more deeply
>>>> unifying
>>>>>> the
>>>>>>>>> batch
>>>>>>>>>>> and streaming
>>>>>>>>>>> execution.
>>>>>>>>>>>
>>>>>>>>>>> The proposal is to merge Blink's enhancements into Flink, to
>>>>> give
>>>>>>>>> Flink's
>>>>>>>>>>> SQL/Table API and
>>>>>>>>>>> execution a big boost in usability and performance.
>>>>>>>>>>>
>>>>>>>>>>> Just to avoid any confusion: This is not a suggested change
>>>> of
>>>>>>> focus to
>>>>>>>>>>> batch processing,
>>>>>>>>>>> nor would this break with any of the streaming architecture
>>>> and
>>>>>>> vision
>>>>>>>>> of
>>>>>>>>>>> Flink.
>>>>>>>>>>> This contribution follows very much the principle of "batch
>>>> is
>>>>> a
>>>>>>>>> special
>>>>>>>>>>> case of streaming".
>>>>>>>>>>> As a special case, batch makes special optimizations
>>>> possible.
>>>>> In
>>>>>>> its
>>>>>>>>>>> current state,
>>>>>>>>>>> Flink does not exploit many of these optimizations. This
>>>>>>> contribution
>>>>>>>>>> adds
>>>>>>>>>>> exactly these
>>>>>>>>>>> optimizations and makes the streaming model of Flink
>>>> applicable
>>>>>> to
>>>>>>>>> harder
>>>>>>>>>>> batch use cases.
>>>>>>>>>>>
>>>>>>>>>>> Assuming that the community is excited about this as well,
>>>> and
>>>>> in
>>>>>>> favor
>>>>>>>>>> of
>>>>>>>>>>> these enhancements
>>>>>>>>>>> to Flink's capabilities, below are some thoughts on how this
>>>>>>>>> contribution
>>>>>>>>>>> and integration
>>>>>>>>>>> could work.
>>>>>>>>>>>
>>>>>>>>>>> --- Making the code available ---
>>>>>>>>>>>
>>>>>>>>>>> At the moment, the Blink code is in the form of a big Flink
>>>>> fork
>>>>>>>>> (rather
>>>>>>>>>>> than isolated
>>>>>>>>>>> patches on top of Flink), so the integration is unfortunately
>>>>> not
>>>>>>> as
>>>>>>>>> easy
>>>>>>>>>>> as merging a
>>>>>>>>>>> few patches or pull requests.
>>>>>>>>>>>
>>>>>>>>>>> To support a non-disruptive merge of such a big
>>>> contribution, I
>>>>>>> believe
>>>>>>>>>> it
>>>>>>>>>>> make sense to make
>>>>>>>>>>> the code of the fork available in the Flink project first.
>>>>>>>>>>>   From there on, we can start to work on the details for
>>>> merging
>>>>>> the
>>>>>>>>>>> enhancements, including
>>>>>>>>>>> the refactoring of the necessary parts in the Flink master
>>>> and
>>>>>> the
>>>>>>>>> Blink
>>>>>>>>>>> code to make a
>>>>>>>>>>> merge possible without repeatedly breaking compatibility.
>>>>>>>>>>>
>>>>>>>>>>> The first question is where do we put the code of the Blink
>>>>> fork
>>>>>>> during
>>>>>>>>>> the
>>>>>>>>>>> merging procedure?
>>>>>>>>>>> My first thought was to temporarily add a repository (like
>>>>>>>>>>> "flink-blink-staging"), but we could
>>>>>>>>>>> also put it into a special branch in the main Flink
>>>> repository.
>>>>>>>>>>> I will start a separate thread about discussing a possible
>>>>>>> strategy to
>>>>>>>>>>> handle and merge
>>>>>>>>>>> such a big contribution.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Chesnay Schepler-3
 From the ASF side Jar files do notrequire a vote/release process, this
is at the discretion of the PMC.

However, I have my doubts whether at this time we could even create a
source release of Blink given that we'd have to vet the code-base first.

Even without source release we could still distribute jars, but would
not be allowed to advertise them to users as they do not constitute an
official release.

On 23.01.2019 11:41, Timo Walther wrote:

> As far as I know it, we will not provide any binaries but only the
> source code. JAR files on Apache servers would need an official
> voting/release process. Interested users can build Blink themselves
> using `mvn clean package`.
>
> @Stephan: Please correct me if I'm wrong.
>
> Regards,
> Timo
>
> Am 23.01.19 um 11:16 schrieb Kurt Young:
>> Hi Timo,
>>
>> What about the jar files, will blink's jar be uploaded to apache
>> repository? If not, i think it will be very inconvenient for users who
>> wants to try blink and view the documents if they need some help from
>> doc.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther <[hidden email]> wrote:
>>
>>> Hi Kurt,
>>>
>>> I would not make the Blink's documentation visible to users or search
>>> engines via a website. Otherwise this would communicate that Blink
>>> is an
>>> official release. I would suggest to put the Blink docs into `/docs`
>>> and
>>> people can build it with `./docs/build.sh -pi` if there are interested.
>>> I would not invest time into setting up a docs infrastructure.
>>>
>>> Regards,
>>> Timo
>>>
>>> Am 23.01.19 um 08:56 schrieb Kurt Young:
>>>> Thanks @Stephan for this exciting announcement!
>>>>
>>>> >From my point of view, i would prefer to use branch. It makes the
>>> message
>>>> "Blink is pat of Flink" more straightforward and clear.
>>>>
>>>> Except for the location of blink codes, there are some other questions
>>> like
>>>> what version should should use, and where do we put blink's documents.
>>>> Currently, we choose to use "1.5.1-blink-r0" as blink's version since
>>> blink
>>>> forked from Flink's 1.5.1. We also added some docs to blink just as
>>>> Flink
>>>> did. Can blink use a website like
>>>> "https://ci.apache.org/projects/flink/flink-docs-release-1.7/" to put
>>> all
>>>> blink's docs, change it to something like
>>>> https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng <[hidden email]>
>>> wrote:
>>>>> Hi all,
>>>>>
>>>>> @Stephan  Thanks a lot for driving these efforts. I think a lot of
>>> people
>>>>> is already waiting for this.
>>>>> +1 for opening the blink source code.
>>>>> Both a separate repository or a special branch is ok for me.
>>>>> Hopefully,
>>>>> this will not last too long.
>>>>>
>>>>> Best, Hequn
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:
>>>>>
>>>>>> Great news! Looking forward to the new wave of developments.
>>>>>>
>>>>>> If Blink needs to be continuously updated, fix bugs, release
>>>>>> versions,
>>>>>> maybe a separate repository is a better idea.
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]>
>>> wrote:
>>>>>>> Hey!
>>>>>>> I also think that creating the separate branch for Blink in
>>>>>>> Flink repo
>>>>>> is a
>>>>>>> better idea than creating the fork as IMHO it will allow merging
>>>>> changes
>>>>>>> more easily.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Dom.
>>>>>>>
>>>>>>> wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
>>>>>>>
>>>>>>>> Hey Stephan and others,
>>>>>>>>
>>>>>>>> thanks for the summary. I'm very excited about the outlined
>>>>>> improvements.
>>>>>>>> :-)
>>>>>>>>
>>>>>>>> Separate branch vs. fork: I'm fine with either of the suggestions.
>>>>>>>> Depending on the expected strategy for merging the changes,
>>>>>>>> expected
>>>>>>>> number of additional changes, etc., either one or the other
>>>>>>>> approach
>>>>>>>> might be better suited.
>>>>>>>>
>>>>>>>> – Ufuk
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>> Hi Driesprong,
>>>>>>>>>
>>>>>>>>> Glad to hear that you're interested with blink's codes. Actually,
>>>>>> blink
>>>>>>>>> only has one branch by itself, so either a separated repo or a
>>>>>> flink's
>>>>>>>>> branch works for blink's code share.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Kurt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
>>>>>> <[hidden email]
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Great news Stephan!
>>>>>>>>>>
>>>>>>>>>> Why not make the code available by having a fork of Flink on
>>>>>>> Alibaba's
>>>>>>>>>> Github account. This will allow us to do easy diff's in the
>>>>> Github
>>>>>> UI
>>>>>>>> and
>>>>>>>>>> create PR's of cherry-picked commits if needed. I can imagine
>>>>> that
>>>>>>> the
>>>>>>>>>> Blink codebase has a lot of branches by itself, so just
>>>>>>>>>> pushing a
>>>>>>>> couple of
>>>>>>>>>> branches to the main Flink repo is not ideal. Looking forward to
>>>>>> it!
>>>>>>>>>> Cheers, Fokko
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
>>>>>>> [hidden email]
>>>>>>>>> :
>>>>>>>>>>> big +1 to contribute Blink codebase directly into the Apache
>>>>>> Flink
>>>>>>>>>> project.
>>>>>>>>>>> Looking forward to the new journey.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Shaoxuan
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
>>>>>> [hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>>>    Thanks Stephan! We are hoping to make the process as
>>>>>>>> non-disruptive as
>>>>>>>>>>>> possible to the Flink community. Making the Blink codebase
>>>>>> public
>>>>>>>> is
>>>>>>>>>> the
>>>>>>>>>>>> first step that hopefully facilitates further discussions.
>>>>>>>>>>>> Xiaowei
>>>>>>>>>>>>
>>>>>>>>>>>>       On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
>>>>> Ewen
>>>>>> <
>>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>    Dear Flink Community!
>>>>>>>>>>>>
>>>>>>>>>>>> Some of you may have heard it already from announcements or
>>>>>> from
>>>>>>> a
>>>>>>>>>> Flink
>>>>>>>>>>>> Forward talk:
>>>>>>>>>>>> Alibaba has decided to open source its in-house improvements
>>>>> to
>>>>>>>> Flink,
>>>>>>>>>>>> called Blink!
>>>>>>>>>>>> First of all, big thanks to team that developed these
>>>>>>> improvements
>>>>>>>> and
>>>>>>>>>>> made
>>>>>>>>>>>> this
>>>>>>>>>>>> contribution possible!
>>>>>>>>>>>>
>>>>>>>>>>>> Blink has some very exciting enhancements, most prominently
>>>>> on
>>>>>>> the
>>>>>>>>>> Table
>>>>>>>>>>>> API/SQL side
>>>>>>>>>>>> and the unified execution of these programs. For batch
>>>>>> (bounded)
>>>>>>>> data,
>>>>>>>>>>> the
>>>>>>>>>>>> SQL execution
>>>>>>>>>>>> has full TPC-DS coverage (which is a big deal), and the
>>>>>> execution
>>>>>>>> is
>>>>>>>>>> more
>>>>>>>>>>>> than 10x faster
>>>>>>>>>>>> than the current SQL runtime in Flink. Blink has also added
>>>>>>>> support for
>>>>>>>>>>>> catalogs,
>>>>>>>>>>>> improved the failover speed of batch queries and the resource
>>>>>>>>>> management.
>>>>>>>>>>>> It also
>>>>>>>>>>>> makes some good steps in the direction of more deeply
>>>>> unifying
>>>>>>> the
>>>>>>>>>> batch
>>>>>>>>>>>> and streaming
>>>>>>>>>>>> execution.
>>>>>>>>>>>>
>>>>>>>>>>>> The proposal is to merge Blink's enhancements into Flink, to
>>>>>> give
>>>>>>>>>> Flink's
>>>>>>>>>>>> SQL/Table API and
>>>>>>>>>>>> execution a big boost in usability and performance.
>>>>>>>>>>>>
>>>>>>>>>>>> Just to avoid any confusion: This is not a suggested change
>>>>> of
>>>>>>>> focus to
>>>>>>>>>>>> batch processing,
>>>>>>>>>>>> nor would this break with any of the streaming architecture
>>>>> and
>>>>>>>> vision
>>>>>>>>>> of
>>>>>>>>>>>> Flink.
>>>>>>>>>>>> This contribution follows very much the principle of "batch
>>>>> is
>>>>>> a
>>>>>>>>>> special
>>>>>>>>>>>> case of streaming".
>>>>>>>>>>>> As a special case, batch makes special optimizations
>>>>> possible.
>>>>>> In
>>>>>>>> its
>>>>>>>>>>>> current state,
>>>>>>>>>>>> Flink does not exploit many of these optimizations. This
>>>>>>>> contribution
>>>>>>>>>>> adds
>>>>>>>>>>>> exactly these
>>>>>>>>>>>> optimizations and makes the streaming model of Flink
>>>>> applicable
>>>>>>> to
>>>>>>>>>> harder
>>>>>>>>>>>> batch use cases.
>>>>>>>>>>>>
>>>>>>>>>>>> Assuming that the community is excited about this as well,
>>>>> and
>>>>>> in
>>>>>>>> favor
>>>>>>>>>>> of
>>>>>>>>>>>> these enhancements
>>>>>>>>>>>> to Flink's capabilities, below are some thoughts on how this
>>>>>>>>>> contribution
>>>>>>>>>>>> and integration
>>>>>>>>>>>> could work.
>>>>>>>>>>>>
>>>>>>>>>>>> --- Making the code available ---
>>>>>>>>>>>>
>>>>>>>>>>>> At the moment, the Blink code is in the form of a big Flink
>>>>>> fork
>>>>>>>>>> (rather
>>>>>>>>>>>> than isolated
>>>>>>>>>>>> patches on top of Flink), so the integration is unfortunately
>>>>>> not
>>>>>>>> as
>>>>>>>>>> easy
>>>>>>>>>>>> as merging a
>>>>>>>>>>>> few patches or pull requests.
>>>>>>>>>>>>
>>>>>>>>>>>> To support a non-disruptive merge of such a big
>>>>> contribution, I
>>>>>>>> believe
>>>>>>>>>>> it
>>>>>>>>>>>> make sense to make
>>>>>>>>>>>> the code of the fork available in the Flink project first.
>>>>>>>>>>>>   From there on, we can start to work on the details for
>>>>> merging
>>>>>>> the
>>>>>>>>>>>> enhancements, including
>>>>>>>>>>>> the refactoring of the necessary parts in the Flink master
>>>>> and
>>>>>>> the
>>>>>>>>>> Blink
>>>>>>>>>>>> code to make a
>>>>>>>>>>>> merge possible without repeatedly breaking compatibility.
>>>>>>>>>>>>
>>>>>>>>>>>> The first question is where do we put the code of the Blink
>>>>>> fork
>>>>>>>> during
>>>>>>>>>>> the
>>>>>>>>>>>> merging procedure?
>>>>>>>>>>>> My first thought was to temporarily add a repository (like
>>>>>>>>>>>> "flink-blink-staging"), but we could
>>>>>>>>>>>> also put it into a special branch in the main Flink
>>>>> repository.
>>>>>>>>>>>> I will start a separate thread about discussing a possible
>>>>>>>> strategy to
>>>>>>>>>>>> handle and merge
>>>>>>>>>>>> such a big contribution.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Becket Qin
Really excited to see Blink joining the Flink community!

My two cents regarding repo v.s. branch, I am +1 for a branch in Flink.
Among many things, what's most important at this point is probably to make
Blink code available to the developers so people can discuss the merge
strategy. Creating a branch is probably the one of the fastest way to do
that. We can always create separate repo later if necessary.

WRT the doc and jar distribution, It is true that we are going to have some
major refactoring to the code. But I can imagine some curious users may
still want to try out something in Blink and it would be good if we can do
them a favor. Legal wise, my hunch is that it is probably OK for someone to
just build the jars and docs, host it somewhere for convenience. But it
should be clear that this is just for convenience purpose instead of an
official release form Apache (unless we would like to make it official).

Thanks,

Jiangjie (Becket) Qin

On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler <[hidden email]> wrote:

>  From the ASF side Jar files do notrequire a vote/release process, this
> is at the discretion of the PMC.
>
> However, I have my doubts whether at this time we could even create a
> source release of Blink given that we'd have to vet the code-base first.
>
> Even without source release we could still distribute jars, but would
> not be allowed to advertise them to users as they do not constitute an
> official release.
>
> On 23.01.2019 11:41, Timo Walther wrote:
> > As far as I know it, we will not provide any binaries but only the
> > source code. JAR files on Apache servers would need an official
> > voting/release process. Interested users can build Blink themselves
> > using `mvn clean package`.
> >
> > @Stephan: Please correct me if I'm wrong.
> >
> > Regards,
> > Timo
> >
> > Am 23.01.19 um 11:16 schrieb Kurt Young:
> >> Hi Timo,
> >>
> >> What about the jar files, will blink's jar be uploaded to apache
> >> repository? If not, i think it will be very inconvenient for users who
> >> wants to try blink and view the documents if they need some help from
> >> doc.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther <[hidden email]>
> wrote:
> >>
> >>> Hi Kurt,
> >>>
> >>> I would not make the Blink's documentation visible to users or search
> >>> engines via a website. Otherwise this would communicate that Blink
> >>> is an
> >>> official release. I would suggest to put the Blink docs into `/docs`
> >>> and
> >>> people can build it with `./docs/build.sh -pi` if there are interested.
> >>> I would not invest time into setting up a docs infrastructure.
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>> Am 23.01.19 um 08:56 schrieb Kurt Young:
> >>>> Thanks @Stephan for this exciting announcement!
> >>>>
> >>>> >From my point of view, i would prefer to use branch. It makes the
> >>> message
> >>>> "Blink is pat of Flink" more straightforward and clear.
> >>>>
> >>>> Except for the location of blink codes, there are some other questions
> >>> like
> >>>> what version should should use, and where do we put blink's documents.
> >>>> Currently, we choose to use "1.5.1-blink-r0" as blink's version since
> >>> blink
> >>>> forked from Flink's 1.5.1. We also added some docs to blink just as
> >>>> Flink
> >>>> did. Can blink use a website like
> >>>> "https://ci.apache.org/projects/flink/flink-docs-release-1.7/" to put
> >>> all
> >>>> blink's docs, change it to something like
> >>>> https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng <[hidden email]>
> >>> wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> @Stephan  Thanks a lot for driving these efforts. I think a lot of
> >>> people
> >>>>> is already waiting for this.
> >>>>> +1 for opening the blink source code.
> >>>>> Both a separate repository or a special branch is ok for me.
> >>>>> Hopefully,
> >>>>> this will not last too long.
> >>>>>
> >>>>> Best, Hequn
> >>>>>
> >>>>>
> >>>>> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:
> >>>>>
> >>>>>> Great news! Looking forward to the new wave of developments.
> >>>>>>
> >>>>>> If Blink needs to be continuously updated, fix bugs, release
> >>>>>> versions,
> >>>>>> maybe a separate repository is a better idea.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]>
> >>> wrote:
> >>>>>>> Hey!
> >>>>>>> I also think that creating the separate branch for Blink in
> >>>>>>> Flink repo
> >>>>>> is a
> >>>>>>> better idea than creating the fork as IMHO it will allow merging
> >>>>> changes
> >>>>>>> more easily.
> >>>>>>>
> >>>>>>> Best Regards,
> >>>>>>> Dom.
> >>>>>>>
> >>>>>>> wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
> >>>>>>>
> >>>>>>>> Hey Stephan and others,
> >>>>>>>>
> >>>>>>>> thanks for the summary. I'm very excited about the outlined
> >>>>>> improvements.
> >>>>>>>> :-)
> >>>>>>>>
> >>>>>>>> Separate branch vs. fork: I'm fine with either of the suggestions.
> >>>>>>>> Depending on the expected strategy for merging the changes,
> >>>>>>>> expected
> >>>>>>>> number of additional changes, etc., either one or the other
> >>>>>>>> approach
> >>>>>>>> might be better suited.
> >>>>>>>>
> >>>>>>>> – Ufuk
> >>>>>>>>
> >>>>>>>> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>> Hi Driesprong,
> >>>>>>>>>
> >>>>>>>>> Glad to hear that you're interested with blink's codes. Actually,
> >>>>>> blink
> >>>>>>>>> only has one branch by itself, so either a separated repo or a
> >>>>>> flink's
> >>>>>>>>> branch works for blink's code share.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Kurt
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
> >>>>>> <[hidden email]
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Great news Stephan!
> >>>>>>>>>>
> >>>>>>>>>> Why not make the code available by having a fork of Flink on
> >>>>>>> Alibaba's
> >>>>>>>>>> Github account. This will allow us to do easy diff's in the
> >>>>> Github
> >>>>>> UI
> >>>>>>>> and
> >>>>>>>>>> create PR's of cherry-picked commits if needed. I can imagine
> >>>>> that
> >>>>>>> the
> >>>>>>>>>> Blink codebase has a lot of branches by itself, so just
> >>>>>>>>>> pushing a
> >>>>>>>> couple of
> >>>>>>>>>> branches to the main Flink repo is not ideal. Looking forward to
> >>>>>> it!
> >>>>>>>>>> Cheers, Fokko
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
> >>>>>>> [hidden email]
> >>>>>>>>> :
> >>>>>>>>>>> big +1 to contribute Blink codebase directly into the Apache
> >>>>>> Flink
> >>>>>>>>>> project.
> >>>>>>>>>>> Looking forward to the new journey.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Shaoxuan
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
> >>>>>> [hidden email]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>    Thanks Stephan! We are hoping to make the process as
> >>>>>>>> non-disruptive as
> >>>>>>>>>>>> possible to the Flink community. Making the Blink codebase
> >>>>>> public
> >>>>>>>> is
> >>>>>>>>>> the
> >>>>>>>>>>>> first step that hopefully facilitates further discussions.
> >>>>>>>>>>>> Xiaowei
> >>>>>>>>>>>>
> >>>>>>>>>>>>       On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
> >>>>> Ewen
> >>>>>> <
> >>>>>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>    Dear Flink Community!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Some of you may have heard it already from announcements or
> >>>>>> from
> >>>>>>> a
> >>>>>>>>>> Flink
> >>>>>>>>>>>> Forward talk:
> >>>>>>>>>>>> Alibaba has decided to open source its in-house improvements
> >>>>> to
> >>>>>>>> Flink,
> >>>>>>>>>>>> called Blink!
> >>>>>>>>>>>> First of all, big thanks to team that developed these
> >>>>>>> improvements
> >>>>>>>> and
> >>>>>>>>>>> made
> >>>>>>>>>>>> this
> >>>>>>>>>>>> contribution possible!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Blink has some very exciting enhancements, most prominently
> >>>>> on
> >>>>>>> the
> >>>>>>>>>> Table
> >>>>>>>>>>>> API/SQL side
> >>>>>>>>>>>> and the unified execution of these programs. For batch
> >>>>>> (bounded)
> >>>>>>>> data,
> >>>>>>>>>>> the
> >>>>>>>>>>>> SQL execution
> >>>>>>>>>>>> has full TPC-DS coverage (which is a big deal), and the
> >>>>>> execution
> >>>>>>>> is
> >>>>>>>>>> more
> >>>>>>>>>>>> than 10x faster
> >>>>>>>>>>>> than the current SQL runtime in Flink. Blink has also added
> >>>>>>>> support for
> >>>>>>>>>>>> catalogs,
> >>>>>>>>>>>> improved the failover speed of batch queries and the resource
> >>>>>>>>>> management.
> >>>>>>>>>>>> It also
> >>>>>>>>>>>> makes some good steps in the direction of more deeply
> >>>>> unifying
> >>>>>>> the
> >>>>>>>>>> batch
> >>>>>>>>>>>> and streaming
> >>>>>>>>>>>> execution.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The proposal is to merge Blink's enhancements into Flink, to
> >>>>>> give
> >>>>>>>>>> Flink's
> >>>>>>>>>>>> SQL/Table API and
> >>>>>>>>>>>> execution a big boost in usability and performance.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Just to avoid any confusion: This is not a suggested change
> >>>>> of
> >>>>>>>> focus to
> >>>>>>>>>>>> batch processing,
> >>>>>>>>>>>> nor would this break with any of the streaming architecture
> >>>>> and
> >>>>>>>> vision
> >>>>>>>>>> of
> >>>>>>>>>>>> Flink.
> >>>>>>>>>>>> This contribution follows very much the principle of "batch
> >>>>> is
> >>>>>> a
> >>>>>>>>>> special
> >>>>>>>>>>>> case of streaming".
> >>>>>>>>>>>> As a special case, batch makes special optimizations
> >>>>> possible.
> >>>>>> In
> >>>>>>>> its
> >>>>>>>>>>>> current state,
> >>>>>>>>>>>> Flink does not exploit many of these optimizations. This
> >>>>>>>> contribution
> >>>>>>>>>>> adds
> >>>>>>>>>>>> exactly these
> >>>>>>>>>>>> optimizations and makes the streaming model of Flink
> >>>>> applicable
> >>>>>>> to
> >>>>>>>>>> harder
> >>>>>>>>>>>> batch use cases.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Assuming that the community is excited about this as well,
> >>>>> and
> >>>>>> in
> >>>>>>>> favor
> >>>>>>>>>>> of
> >>>>>>>>>>>> these enhancements
> >>>>>>>>>>>> to Flink's capabilities, below are some thoughts on how this
> >>>>>>>>>> contribution
> >>>>>>>>>>>> and integration
> >>>>>>>>>>>> could work.
> >>>>>>>>>>>>
> >>>>>>>>>>>> --- Making the code available ---
> >>>>>>>>>>>>
> >>>>>>>>>>>> At the moment, the Blink code is in the form of a big Flink
> >>>>>> fork
> >>>>>>>>>> (rather
> >>>>>>>>>>>> than isolated
> >>>>>>>>>>>> patches on top of Flink), so the integration is unfortunately
> >>>>>> not
> >>>>>>>> as
> >>>>>>>>>> easy
> >>>>>>>>>>>> as merging a
> >>>>>>>>>>>> few patches or pull requests.
> >>>>>>>>>>>>
> >>>>>>>>>>>> To support a non-disruptive merge of such a big
> >>>>> contribution, I
> >>>>>>>> believe
> >>>>>>>>>>> it
> >>>>>>>>>>>> make sense to make
> >>>>>>>>>>>> the code of the fork available in the Flink project first.
> >>>>>>>>>>>>   From there on, we can start to work on the details for
> >>>>> merging
> >>>>>>> the
> >>>>>>>>>>>> enhancements, including
> >>>>>>>>>>>> the refactoring of the necessary parts in the Flink master
> >>>>> and
> >>>>>>> the
> >>>>>>>>>> Blink
> >>>>>>>>>>>> code to make a
> >>>>>>>>>>>> merge possible without repeatedly breaking compatibility.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The first question is where do we put the code of the Blink
> >>>>>> fork
> >>>>>>>> during
> >>>>>>>>>>> the
> >>>>>>>>>>>> merging procedure?
> >>>>>>>>>>>> My first thought was to temporarily add a repository (like
> >>>>>>>>>>>> "flink-blink-staging"), but we could
> >>>>>>>>>>>> also put it into a special branch in the main Flink
> >>>>> repository.
> >>>>>>>>>>>> I will start a separate thread about discussing a possible
> >>>>>>>> strategy to
> >>>>>>>>>>>> handle and merge
> >>>>>>>>>>>> such a big contribution.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Stephan
> >>>>>>>>>>>>
> >>>
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Stephan Ewen
Nice to see this lively discussion.

*--- Branch Versus Repository ---*

Looks like this is converging towards pushing a branch.
How about naming the branch simply "blink-1.5" ? That would be in line with
the 1.5 version branch of Flink, which is simply called "release-1.5" ?

*--- SGA --- *

The SGA (Software Grant Agreement) should be either filed already or in the
process of filing.

*--- Offering Jars for Blink ---*

As Chesnay and Timo mentioned, we cannot easily offer a "Release" of Blink
(source or binary), because that would require a thorough
checking of licenses and creating/ bundling license files. That is a lot of
work, as we recently experienced again in the Flink master.

What we can do is upload compiled jar files and link to them somewhere in
the blink docs. We need to add a disclaimer that these are
convenience jars, and not an official Apache release. I hope that would
work for the users that are curious to try things out.

*--- Docs for Blink --- *

Do we need a versioned website here? If not, can we simply make this a
subsection of the current Flink snapshot docs?
Next to "Flink Development" and "Internals", we could have a section on
"Blink branch".
I think it is crucial, thought, to make it clear that this is temporary and
will eventually be subsumed by the main release, just
so that users do not get confused.

Best,
Stephan


On Wed, Jan 23, 2019 at 12:23 PM Becket Qin <[hidden email]> wrote:

> Really excited to see Blink joining the Flink community!
>
> My two cents regarding repo v.s. branch, I am +1 for a branch in Flink.
> Among many things, what's most important at this point is probably to make
> Blink code available to the developers so people can discuss the merge
> strategy. Creating a branch is probably the one of the fastest way to do
> that. We can always create separate repo later if necessary.
>
> WRT the doc and jar distribution, It is true that we are going to have
> some major refactoring to the code. But I can imagine some curious users
> may still want to try out something in Blink and it would be good if we can
> do them a favor. Legal wise, my hunch is that it is probably OK for someone
> to just build the jars and docs, host it somewhere for convenience. But it
> should be clear that this is just for convenience purpose instead of an
> official release form Apache (unless we would like to make it official).
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler <[hidden email]>
> wrote:
>
>>  From the ASF side Jar files do notrequire a vote/release process, this
>> is at the discretion of the PMC.
>>
>> However, I have my doubts whether at this time we could even create a
>> source release of Blink given that we'd have to vet the code-base first.
>>
>> Even without source release we could still distribute jars, but would
>> not be allowed to advertise them to users as they do not constitute an
>> official release.
>>
>> On 23.01.2019 11:41, Timo Walther wrote:
>> > As far as I know it, we will not provide any binaries but only the
>> > source code. JAR files on Apache servers would need an official
>> > voting/release process. Interested users can build Blink themselves
>> > using `mvn clean package`.
>> >
>> > @Stephan: Please correct me if I'm wrong.
>> >
>> > Regards,
>> > Timo
>> >
>> > Am 23.01.19 um 11:16 schrieb Kurt Young:
>> >> Hi Timo,
>> >>
>> >> What about the jar files, will blink's jar be uploaded to apache
>> >> repository? If not, i think it will be very inconvenient for users who
>> >> wants to try blink and view the documents if they need some help from
>> >> doc.
>> >>
>> >> Best,
>> >> Kurt
>> >>
>> >>
>> >> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther <[hidden email]>
>> wrote:
>> >>
>> >>> Hi Kurt,
>> >>>
>> >>> I would not make the Blink's documentation visible to users or search
>> >>> engines via a website. Otherwise this would communicate that Blink
>> >>> is an
>> >>> official release. I would suggest to put the Blink docs into `/docs`
>> >>> and
>> >>> people can build it with `./docs/build.sh -pi` if there are
>> interested.
>> >>> I would not invest time into setting up a docs infrastructure.
>> >>>
>> >>> Regards,
>> >>> Timo
>> >>>
>> >>> Am 23.01.19 um 08:56 schrieb Kurt Young:
>> >>>> Thanks @Stephan for this exciting announcement!
>> >>>>
>> >>>> >From my point of view, i would prefer to use branch. It makes the
>> >>> message
>> >>>> "Blink is pat of Flink" more straightforward and clear.
>> >>>>
>> >>>> Except for the location of blink codes, there are some other
>> questions
>> >>> like
>> >>>> what version should should use, and where do we put blink's
>> documents.
>> >>>> Currently, we choose to use "1.5.1-blink-r0" as blink's version since
>> >>> blink
>> >>>> forked from Flink's 1.5.1. We also added some docs to blink just as
>> >>>> Flink
>> >>>> did. Can blink use a website like
>> >>>> "https://ci.apache.org/projects/flink/flink-docs-release-1.7/" to
>> put
>> >>> all
>> >>>> blink's docs, change it to something like
>> >>>> https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
>> >>>>
>> >>>> Best,
>> >>>> Kurt
>> >>>>
>> >>>>
>> >>>> On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng <[hidden email]>
>> >>> wrote:
>> >>>>> Hi all,
>> >>>>>
>> >>>>> @Stephan  Thanks a lot for driving these efforts. I think a lot of
>> >>> people
>> >>>>> is already waiting for this.
>> >>>>> +1 for opening the blink source code.
>> >>>>> Both a separate repository or a special branch is ok for me.
>> >>>>> Hopefully,
>> >>>>> this will not last too long.
>> >>>>>
>> >>>>> Best, Hequn
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]> wrote:
>> >>>>>
>> >>>>>> Great news! Looking forward to the new wave of developments.
>> >>>>>>
>> >>>>>> If Blink needs to be continuously updated, fix bugs, release
>> >>>>>> versions,
>> >>>>>> maybe a separate repository is a better idea.
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Jark
>> >>>>>>
>> >>>>>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]>
>> >>> wrote:
>> >>>>>>> Hey!
>> >>>>>>> I also think that creating the separate branch for Blink in
>> >>>>>>> Flink repo
>> >>>>>> is a
>> >>>>>>> better idea than creating the fork as IMHO it will allow merging
>> >>>>> changes
>> >>>>>>> more easily.
>> >>>>>>>
>> >>>>>>> Best Regards,
>> >>>>>>> Dom.
>> >>>>>>>
>> >>>>>>> wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]> napisał(a):
>> >>>>>>>
>> >>>>>>>> Hey Stephan and others,
>> >>>>>>>>
>> >>>>>>>> thanks for the summary. I'm very excited about the outlined
>> >>>>>> improvements.
>> >>>>>>>> :-)
>> >>>>>>>>
>> >>>>>>>> Separate branch vs. fork: I'm fine with either of the
>> suggestions.
>> >>>>>>>> Depending on the expected strategy for merging the changes,
>> >>>>>>>> expected
>> >>>>>>>> number of additional changes, etc., either one or the other
>> >>>>>>>> approach
>> >>>>>>>> might be better suited.
>> >>>>>>>>
>> >>>>>>>> – Ufuk
>> >>>>>>>>
>> >>>>>>>> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>> Hi Driesprong,
>> >>>>>>>>>
>> >>>>>>>>> Glad to hear that you're interested with blink's codes.
>> Actually,
>> >>>>>> blink
>> >>>>>>>>> only has one branch by itself, so either a separated repo or a
>> >>>>>> flink's
>> >>>>>>>>> branch works for blink's code share.
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>> Kurt
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
>> >>>>>> <[hidden email]
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Great news Stephan!
>> >>>>>>>>>>
>> >>>>>>>>>> Why not make the code available by having a fork of Flink on
>> >>>>>>> Alibaba's
>> >>>>>>>>>> Github account. This will allow us to do easy diff's in the
>> >>>>> Github
>> >>>>>> UI
>> >>>>>>>> and
>> >>>>>>>>>> create PR's of cherry-picked commits if needed. I can imagine
>> >>>>> that
>> >>>>>>> the
>> >>>>>>>>>> Blink codebase has a lot of branches by itself, so just
>> >>>>>>>>>> pushing a
>> >>>>>>>> couple of
>> >>>>>>>>>> branches to the main Flink repo is not ideal. Looking forward
>> to
>> >>>>>> it!
>> >>>>>>>>>> Cheers, Fokko
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
>> >>>>>>> [hidden email]
>> >>>>>>>>> :
>> >>>>>>>>>>> big +1 to contribute Blink codebase directly into the Apache
>> >>>>>> Flink
>> >>>>>>>>>> project.
>> >>>>>>>>>>> Looking forward to the new journey.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regards,
>> >>>>>>>>>>> Shaoxuan
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
>> >>>>>> [hidden email]>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>>    Thanks Stephan! We are hoping to make the process as
>> >>>>>>>> non-disruptive as
>> >>>>>>>>>>>> possible to the Flink community. Making the Blink codebase
>> >>>>>> public
>> >>>>>>>> is
>> >>>>>>>>>> the
>> >>>>>>>>>>>> first step that hopefully facilitates further discussions.
>> >>>>>>>>>>>> Xiaowei
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>       On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
>> >>>>> Ewen
>> >>>>>> <
>> >>>>>>>>>>>> [hidden email]> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>    Dear Flink Community!
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Some of you may have heard it already from announcements or
>> >>>>>> from
>> >>>>>>> a
>> >>>>>>>>>> Flink
>> >>>>>>>>>>>> Forward talk:
>> >>>>>>>>>>>> Alibaba has decided to open source its in-house improvements
>> >>>>> to
>> >>>>>>>> Flink,
>> >>>>>>>>>>>> called Blink!
>> >>>>>>>>>>>> First of all, big thanks to team that developed these
>> >>>>>>> improvements
>> >>>>>>>> and
>> >>>>>>>>>>> made
>> >>>>>>>>>>>> this
>> >>>>>>>>>>>> contribution possible!
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Blink has some very exciting enhancements, most prominently
>> >>>>> on
>> >>>>>>> the
>> >>>>>>>>>> Table
>> >>>>>>>>>>>> API/SQL side
>> >>>>>>>>>>>> and the unified execution of these programs. For batch
>> >>>>>> (bounded)
>> >>>>>>>> data,
>> >>>>>>>>>>> the
>> >>>>>>>>>>>> SQL execution
>> >>>>>>>>>>>> has full TPC-DS coverage (which is a big deal), and the
>> >>>>>> execution
>> >>>>>>>> is
>> >>>>>>>>>> more
>> >>>>>>>>>>>> than 10x faster
>> >>>>>>>>>>>> than the current SQL runtime in Flink. Blink has also added
>> >>>>>>>> support for
>> >>>>>>>>>>>> catalogs,
>> >>>>>>>>>>>> improved the failover speed of batch queries and the resource
>> >>>>>>>>>> management.
>> >>>>>>>>>>>> It also
>> >>>>>>>>>>>> makes some good steps in the direction of more deeply
>> >>>>> unifying
>> >>>>>>> the
>> >>>>>>>>>> batch
>> >>>>>>>>>>>> and streaming
>> >>>>>>>>>>>> execution.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The proposal is to merge Blink's enhancements into Flink, to
>> >>>>>> give
>> >>>>>>>>>> Flink's
>> >>>>>>>>>>>> SQL/Table API and
>> >>>>>>>>>>>> execution a big boost in usability and performance.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Just to avoid any confusion: This is not a suggested change
>> >>>>> of
>> >>>>>>>> focus to
>> >>>>>>>>>>>> batch processing,
>> >>>>>>>>>>>> nor would this break with any of the streaming architecture
>> >>>>> and
>> >>>>>>>> vision
>> >>>>>>>>>> of
>> >>>>>>>>>>>> Flink.
>> >>>>>>>>>>>> This contribution follows very much the principle of "batch
>> >>>>> is
>> >>>>>> a
>> >>>>>>>>>> special
>> >>>>>>>>>>>> case of streaming".
>> >>>>>>>>>>>> As a special case, batch makes special optimizations
>> >>>>> possible.
>> >>>>>> In
>> >>>>>>>> its
>> >>>>>>>>>>>> current state,
>> >>>>>>>>>>>> Flink does not exploit many of these optimizations. This
>> >>>>>>>> contribution
>> >>>>>>>>>>> adds
>> >>>>>>>>>>>> exactly these
>> >>>>>>>>>>>> optimizations and makes the streaming model of Flink
>> >>>>> applicable
>> >>>>>>> to
>> >>>>>>>>>> harder
>> >>>>>>>>>>>> batch use cases.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Assuming that the community is excited about this as well,
>> >>>>> and
>> >>>>>> in
>> >>>>>>>> favor
>> >>>>>>>>>>> of
>> >>>>>>>>>>>> these enhancements
>> >>>>>>>>>>>> to Flink's capabilities, below are some thoughts on how this
>> >>>>>>>>>> contribution
>> >>>>>>>>>>>> and integration
>> >>>>>>>>>>>> could work.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --- Making the code available ---
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> At the moment, the Blink code is in the form of a big Flink
>> >>>>>> fork
>> >>>>>>>>>> (rather
>> >>>>>>>>>>>> than isolated
>> >>>>>>>>>>>> patches on top of Flink), so the integration is unfortunately
>> >>>>>> not
>> >>>>>>>> as
>> >>>>>>>>>> easy
>> >>>>>>>>>>>> as merging a
>> >>>>>>>>>>>> few patches or pull requests.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> To support a non-disruptive merge of such a big
>> >>>>> contribution, I
>> >>>>>>>> believe
>> >>>>>>>>>>> it
>> >>>>>>>>>>>> make sense to make
>> >>>>>>>>>>>> the code of the fork available in the Flink project first.
>> >>>>>>>>>>>>   From there on, we can start to work on the details for
>> >>>>> merging
>> >>>>>>> the
>> >>>>>>>>>>>> enhancements, including
>> >>>>>>>>>>>> the refactoring of the necessary parts in the Flink master
>> >>>>> and
>> >>>>>>> the
>> >>>>>>>>>> Blink
>> >>>>>>>>>>>> code to make a
>> >>>>>>>>>>>> merge possible without repeatedly breaking compatibility.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The first question is where do we put the code of the Blink
>> >>>>>> fork
>> >>>>>>>> during
>> >>>>>>>>>>> the
>> >>>>>>>>>>>> merging procedure?
>> >>>>>>>>>>>> My first thought was to temporarily add a repository (like
>> >>>>>>>>>>>> "flink-blink-staging"), but we could
>> >>>>>>>>>>>> also put it into a special branch in the main Flink
>> >>>>> repository.
>> >>>>>>>>>>>> I will start a separate thread about discussing a possible
>> >>>>>>>> strategy to
>> >>>>>>>>>>>> handle and merge
>> >>>>>>>>>>>> such a big contribution.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Best,
>> >>>>>>>>>>>> Stephan
>> >>>>>>>>>>>>
>> >>>
>> >
>> >
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Contributing Alibaba's Blink

Shaoxuan Wang
Thanks Stephan,
The entire plan looks good to me. WRT the "Docs for Flink", a subsection
should be good enough if we just introduce the outlines of what blink has
changed. However, we have made detailed introductions to blink based on the
framework of current release document of Flink (those introductions are
distributed in each subsections). Does it make sense to create a blink
document as a separate one, under the documentation section, say blink-1.5
(temporary, not a release).

Regards,
Shaoxuan


On Wed, Jan 23, 2019 at 10:15 PM Stephan Ewen <[hidden email]> wrote:

> Nice to see this lively discussion.
>
> *--- Branch Versus Repository ---*
>
> Looks like this is converging towards pushing a branch.
> How about naming the branch simply "blink-1.5" ? That would be in line with
> the 1.5 version branch of Flink, which is simply called "release-1.5" ?
>
> *--- SGA --- *
>
> The SGA (Software Grant Agreement) should be either filed already or in the
> process of filing.
>
> *--- Offering Jars for Blink ---*
>
> As Chesnay and Timo mentioned, we cannot easily offer a "Release" of Blink
> (source or binary), because that would require a thorough
> checking of licenses and creating/ bundling license files. That is a lot of
> work, as we recently experienced again in the Flink master.
>
> What we can do is upload compiled jar files and link to them somewhere in
> the blink docs. We need to add a disclaimer that these are
> convenience jars, and not an official Apache release. I hope that would
> work for the users that are curious to try things out.
>
> *--- Docs for Blink --- *
>
> Do we need a versioned website here? If not, can we simply make this a
> subsection of the current Flink snapshot docs?
> Next to "Flink Development" and "Internals", we could have a section on
> "Blink branch".
> I think it is crucial, thought, to make it clear that this is temporary and
> will eventually be subsumed by the main release, just
> so that users do not get confused.
>
> Best,
> Stephan
>
>
> On Wed, Jan 23, 2019 at 12:23 PM Becket Qin <[hidden email]> wrote:
>
> > Really excited to see Blink joining the Flink community!
> >
> > My two cents regarding repo v.s. branch, I am +1 for a branch in Flink.
> > Among many things, what's most important at this point is probably to
> make
> > Blink code available to the developers so people can discuss the merge
> > strategy. Creating a branch is probably the one of the fastest way to do
> > that. We can always create separate repo later if necessary.
> >
> > WRT the doc and jar distribution, It is true that we are going to have
> > some major refactoring to the code. But I can imagine some curious users
> > may still want to try out something in Blink and it would be good if we
> can
> > do them a favor. Legal wise, my hunch is that it is probably OK for
> someone
> > to just build the jars and docs, host it somewhere for convenience. But
> it
> > should be clear that this is just for convenience purpose instead of an
> > official release form Apache (unless we would like to make it official).
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler <[hidden email]>
> > wrote:
> >
> >>  From the ASF side Jar files do notrequire a vote/release process, this
> >> is at the discretion of the PMC.
> >>
> >> However, I have my doubts whether at this time we could even create a
> >> source release of Blink given that we'd have to vet the code-base first.
> >>
> >> Even without source release we could still distribute jars, but would
> >> not be allowed to advertise them to users as they do not constitute an
> >> official release.
> >>
> >> On 23.01.2019 11:41, Timo Walther wrote:
> >> > As far as I know it, we will not provide any binaries but only the
> >> > source code. JAR files on Apache servers would need an official
> >> > voting/release process. Interested users can build Blink themselves
> >> > using `mvn clean package`.
> >> >
> >> > @Stephan: Please correct me if I'm wrong.
> >> >
> >> > Regards,
> >> > Timo
> >> >
> >> > Am 23.01.19 um 11:16 schrieb Kurt Young:
> >> >> Hi Timo,
> >> >>
> >> >> What about the jar files, will blink's jar be uploaded to apache
> >> >> repository? If not, i think it will be very inconvenient for users
> who
> >> >> wants to try blink and view the documents if they need some help from
> >> >> doc.
> >> >>
> >> >> Best,
> >> >> Kurt
> >> >>
> >> >>
> >> >> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther <[hidden email]>
> >> wrote:
> >> >>
> >> >>> Hi Kurt,
> >> >>>
> >> >>> I would not make the Blink's documentation visible to users or
> search
> >> >>> engines via a website. Otherwise this would communicate that Blink
> >> >>> is an
> >> >>> official release. I would suggest to put the Blink docs into `/docs`
> >> >>> and
> >> >>> people can build it with `./docs/build.sh -pi` if there are
> >> interested.
> >> >>> I would not invest time into setting up a docs infrastructure.
> >> >>>
> >> >>> Regards,
> >> >>> Timo
> >> >>>
> >> >>> Am 23.01.19 um 08:56 schrieb Kurt Young:
> >> >>>> Thanks @Stephan for this exciting announcement!
> >> >>>>
> >> >>>> >From my point of view, i would prefer to use branch. It makes the
> >> >>> message
> >> >>>> "Blink is pat of Flink" more straightforward and clear.
> >> >>>>
> >> >>>> Except for the location of blink codes, there are some other
> >> questions
> >> >>> like
> >> >>>> what version should should use, and where do we put blink's
> >> documents.
> >> >>>> Currently, we choose to use "1.5.1-blink-r0" as blink's version
> since
> >> >>> blink
> >> >>>> forked from Flink's 1.5.1. We also added some docs to blink just as
> >> >>>> Flink
> >> >>>> did. Can blink use a website like
> >> >>>> "https://ci.apache.org/projects/flink/flink-docs-release-1.7/" to
> >> put
> >> >>> all
> >> >>>> blink's docs, change it to something like
> >> >>>> https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
> >> >>>>
> >> >>>> Best,
> >> >>>> Kurt
> >> >>>>
> >> >>>>
> >> >>>> On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng <[hidden email]
> >
> >> >>> wrote:
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>> @Stephan  Thanks a lot for driving these efforts. I think a lot of
> >> >>> people
> >> >>>>> is already waiting for this.
> >> >>>>> +1 for opening the blink source code.
> >> >>>>> Both a separate repository or a special branch is ok for me.
> >> >>>>> Hopefully,
> >> >>>>> this will not last too long.
> >> >>>>>
> >> >>>>> Best, Hequn
> >> >>>>>
> >> >>>>>
> >> >>>>> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu <[hidden email]>
> wrote:
> >> >>>>>
> >> >>>>>> Great news! Looking forward to the new wave of developments.
> >> >>>>>>
> >> >>>>>> If Blink needs to be continuously updated, fix bugs, release
> >> >>>>>> versions,
> >> >>>>>> maybe a separate repository is a better idea.
> >> >>>>>>
> >> >>>>>> Best,
> >> >>>>>> Jark
> >> >>>>>>
> >> >>>>>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński <[hidden email]
> >
> >> >>> wrote:
> >> >>>>>>> Hey!
> >> >>>>>>> I also think that creating the separate branch for Blink in
> >> >>>>>>> Flink repo
> >> >>>>>> is a
> >> >>>>>>> better idea than creating the fork as IMHO it will allow merging
> >> >>>>> changes
> >> >>>>>>> more easily.
> >> >>>>>>>
> >> >>>>>>> Best Regards,
> >> >>>>>>> Dom.
> >> >>>>>>>
> >> >>>>>>> wt., 22 sty 2019 o 10:09 Ufuk Celebi <[hidden email]>
> napisał(a):
> >> >>>>>>>
> >> >>>>>>>> Hey Stephan and others,
> >> >>>>>>>>
> >> >>>>>>>> thanks for the summary. I'm very excited about the outlined
> >> >>>>>> improvements.
> >> >>>>>>>> :-)
> >> >>>>>>>>
> >> >>>>>>>> Separate branch vs. fork: I'm fine with either of the
> >> suggestions.
> >> >>>>>>>> Depending on the expected strategy for merging the changes,
> >> >>>>>>>> expected
> >> >>>>>>>> number of additional changes, etc., either one or the other
> >> >>>>>>>> approach
> >> >>>>>>>> might be better suited.
> >> >>>>>>>>
> >> >>>>>>>> – Ufuk
> >> >>>>>>>>
> >> >>>>>>>> On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <[hidden email]>
> >> >>>>>>>> wrote:
> >> >>>>>>>>> Hi Driesprong,
> >> >>>>>>>>>
> >> >>>>>>>>> Glad to hear that you're interested with blink's codes.
> >> Actually,
> >> >>>>>> blink
> >> >>>>>>>>> only has one branch by itself, so either a separated repo or a
> >> >>>>>> flink's
> >> >>>>>>>>> branch works for blink's code share.
> >> >>>>>>>>>
> >> >>>>>>>>> Best,
> >> >>>>>>>>> Kurt
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
> >> >>>>>> <[hidden email]
> >> >>>>>>>>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>> Great news Stephan!
> >> >>>>>>>>>>
> >> >>>>>>>>>> Why not make the code available by having a fork of Flink on
> >> >>>>>>> Alibaba's
> >> >>>>>>>>>> Github account. This will allow us to do easy diff's in the
> >> >>>>> Github
> >> >>>>>> UI
> >> >>>>>>>> and
> >> >>>>>>>>>> create PR's of cherry-picked commits if needed. I can imagine
> >> >>>>> that
> >> >>>>>>> the
> >> >>>>>>>>>> Blink codebase has a lot of branches by itself, so just
> >> >>>>>>>>>> pushing a
> >> >>>>>>>> couple of
> >> >>>>>>>>>> branches to the main Flink repo is not ideal. Looking forward
> >> to
> >> >>>>>> it!
> >> >>>>>>>>>> Cheers, Fokko
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
> >> >>>>>>> [hidden email]
> >> >>>>>>>>> :
> >> >>>>>>>>>>> big +1 to contribute Blink codebase directly into the Apache
> >> >>>>>> Flink
> >> >>>>>>>>>> project.
> >> >>>>>>>>>>> Looking forward to the new journey.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Regards,
> >> >>>>>>>>>>> Shaoxuan
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
> >> >>>>>> [hidden email]>
> >> >>>>>>>>>> wrote:
> >> >>>>>>>>>>>>    Thanks Stephan! We are hoping to make the process as
> >> >>>>>>>> non-disruptive as
> >> >>>>>>>>>>>> possible to the Flink community. Making the Blink codebase
> >> >>>>>> public
> >> >>>>>>>> is
> >> >>>>>>>>>> the
> >> >>>>>>>>>>>> first step that hopefully facilitates further discussions.
> >> >>>>>>>>>>>> Xiaowei
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>       On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
> >> >>>>> Ewen
> >> >>>>>> <
> >> >>>>>>>>>>>> [hidden email]> wrote:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>    Dear Flink Community!
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Some of you may have heard it already from announcements or
> >> >>>>>> from
> >> >>>>>>> a
> >> >>>>>>>>>> Flink
> >> >>>>>>>>>>>> Forward talk:
> >> >>>>>>>>>>>> Alibaba has decided to open source its in-house
> improvements
> >> >>>>> to
> >> >>>>>>>> Flink,
> >> >>>>>>>>>>>> called Blink!
> >> >>>>>>>>>>>> First of all, big thanks to team that developed these
> >> >>>>>>> improvements
> >> >>>>>>>> and
> >> >>>>>>>>>>> made
> >> >>>>>>>>>>>> this
> >> >>>>>>>>>>>> contribution possible!
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Blink has some very exciting enhancements, most prominently
> >> >>>>> on
> >> >>>>>>> the
> >> >>>>>>>>>> Table
> >> >>>>>>>>>>>> API/SQL side
> >> >>>>>>>>>>>> and the unified execution of these programs. For batch
> >> >>>>>> (bounded)
> >> >>>>>>>> data,
> >> >>>>>>>>>>> the
> >> >>>>>>>>>>>> SQL execution
> >> >>>>>>>>>>>> has full TPC-DS coverage (which is a big deal), and the
> >> >>>>>> execution
> >> >>>>>>>> is
> >> >>>>>>>>>> more
> >> >>>>>>>>>>>> than 10x faster
> >> >>>>>>>>>>>> than the current SQL runtime in Flink. Blink has also added
> >> >>>>>>>> support for
> >> >>>>>>>>>>>> catalogs,
> >> >>>>>>>>>>>> improved the failover speed of batch queries and the
> resource
> >> >>>>>>>>>> management.
> >> >>>>>>>>>>>> It also
> >> >>>>>>>>>>>> makes some good steps in the direction of more deeply
> >> >>>>> unifying
> >> >>>>>>> the
> >> >>>>>>>>>> batch
> >> >>>>>>>>>>>> and streaming
> >> >>>>>>>>>>>> execution.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> The proposal is to merge Blink's enhancements into Flink,
> to
> >> >>>>>> give
> >> >>>>>>>>>> Flink's
> >> >>>>>>>>>>>> SQL/Table API and
> >> >>>>>>>>>>>> execution a big boost in usability and performance.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Just to avoid any confusion: This is not a suggested change
> >> >>>>> of
> >> >>>>>>>> focus to
> >> >>>>>>>>>>>> batch processing,
> >> >>>>>>>>>>>> nor would this break with any of the streaming architecture
> >> >>>>> and
> >> >>>>>>>> vision
> >> >>>>>>>>>> of
> >> >>>>>>>>>>>> Flink.
> >> >>>>>>>>>>>> This contribution follows very much the principle of "batch
> >> >>>>> is
> >> >>>>>> a
> >> >>>>>>>>>> special
> >> >>>>>>>>>>>> case of streaming".
> >> >>>>>>>>>>>> As a special case, batch makes special optimizations
> >> >>>>> possible.
> >> >>>>>> In
> >> >>>>>>>> its
> >> >>>>>>>>>>>> current state,
> >> >>>>>>>>>>>> Flink does not exploit many of these optimizations. This
> >> >>>>>>>> contribution
> >> >>>>>>>>>>> adds
> >> >>>>>>>>>>>> exactly these
> >> >>>>>>>>>>>> optimizations and makes the streaming model of Flink
> >> >>>>> applicable
> >> >>>>>>> to
> >> >>>>>>>>>> harder
> >> >>>>>>>>>>>> batch use cases.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Assuming that the community is excited about this as well,
> >> >>>>> and
> >> >>>>>> in
> >> >>>>>>>> favor
> >> >>>>>>>>>>> of
> >> >>>>>>>>>>>> these enhancements
> >> >>>>>>>>>>>> to Flink's capabilities, below are some thoughts on how
> this
> >> >>>>>>>>>> contribution
> >> >>>>>>>>>>>> and integration
> >> >>>>>>>>>>>> could work.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> --- Making the code available ---
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> At the moment, the Blink code is in the form of a big Flink
> >> >>>>>> fork
> >> >>>>>>>>>> (rather
> >> >>>>>>>>>>>> than isolated
> >> >>>>>>>>>>>> patches on top of Flink), so the integration is
> unfortunately
> >> >>>>>> not
> >> >>>>>>>> as
> >> >>>>>>>>>> easy
> >> >>>>>>>>>>>> as merging a
> >> >>>>>>>>>>>> few patches or pull requests.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> To support a non-disruptive merge of such a big
> >> >>>>> contribution, I
> >> >>>>>>>> believe
> >> >>>>>>>>>>> it
> >> >>>>>>>>>>>> make sense to make
> >> >>>>>>>>>>>> the code of the fork available in the Flink project first.
> >> >>>>>>>>>>>>   From there on, we can start to work on the details for
> >> >>>>> merging
> >> >>>>>>> the
> >> >>>>>>>>>>>> enhancements, including
> >> >>>>>>>>>>>> the refactoring of the necessary parts in the Flink master
> >> >>>>> and
> >> >>>>>>> the
> >> >>>>>>>>>> Blink
> >> >>>>>>>>>>>> code to make a
> >> >>>>>>>>>>>> merge possible without repeatedly breaking compatibility.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> The first question is where do we put the code of the Blink
> >> >>>>>> fork
> >> >>>>>>>> during
> >> >>>>>>>>>>> the
> >> >>>>>>>>>>>> merging procedure?
> >> >>>>>>>>>>>> My first thought was to temporarily add a repository (like
> >> >>>>>>>>>>>> "flink-blink-staging"), but we could
> >> >>>>>>>>>>>> also put it into a special branch in the main Flink
> >> >>>>> repository.
> >> >>>>>>>>>>>> I will start a separate thread about discussing a possible
> >> >>>>>>>> strategy to
> >> >>>>>>>>>>>> handle and merge
> >> >>>>>>>>>>>> such a big contribution.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Best,
> >> >>>>>>>>>>>> Stephan
> >> >>>>>>>>>>>>
> >> >>>
> >> >
> >> >
> >>
> >>
>
12