[DISCUSS] Policy on keeping layer alternatives in sync

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Policy on keeping layer alternatives in sync

Fabian Hueske
Hi,

as you all know, Flink has a layered architecture with multiple
alternatives for certain levels.
Exampels are:
- Programming APIs: Java, Scala, (and Python in progress)
- Processing Backends: distributed runtime (former Nephele), Java
Collections, (and potentially Tez in the future)

The challenge with multiple alternatives that serve the same purpuse is
that these should be in sync.
A feature that is added to the Java API should also be added to the Scala
API (and other APIs in the future). The same applies to new runtime
strategies and operators, such as outer joins.

I think we need a policy how to keep the features of different layer
alternatives in sync.
With the recent update of the Scala API, a ScalaAPICompletenessTest was
added that checks whether the Scala API offers the same methods as the Java
API. Adding a feature to the Java API breaks the build and requires to
either adapt the Scala API as well or exclude the added methods from the
APICompletenessTest.
While this test is a great tool to make sure that that APIs are synced,
this basically requires that APIs are always synced, i.e., a modification
of the Java API must go with an equivalent change of the Scala API.
If we make this a tight policy and force compatibility at all times,
contributors must know about several different technologies (Scala Compiler
Macros, Python, the implementation details of multiple runtime backends,
...). This sounds like a huge entrance barrier to me.

To make it clear, I am definitely in favor of keeping APIs and backends in
sync.
However, I propose to enforce this only for releases, i.e., allow
out-of-sync APIs on the master branch and fix the APIs for releases.
With this additional requirement, we also need to think twice which
features to add as multiple components of the system will be affected.

What do you guys think?
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Policy on keeping layer alternatives in sync

Ufuk Celebi-2
Hey Fabian,

thanks for bringing this up.

I would vote to have a hard policy regarding the Scala and Java API as these are our main user facing APIs.

If there was a fundamental problem or language feature, which could not be supported/ported in/to the other API, I would be OK if it was only available in one. But small additions to the APIs like outer joins, which can be in sync should also be in sync.

If someone does not want to add the corresponding feature to the other APIs, I would go for a pull request with a request for someone else to port the missing part it.

I think it is very important for users to be able to assume that all APIs have the same "power". Otherwise we might end up in a situation (and I think we already had it with the broadcast variables for a time), where users have to pick the API, which matches their use case and not their preference.

Best,

Ufuk

On 26 Sep 2014, at 10:43, Fabian Hueske <[hidden email]> wrote:

> Hi,
>
> as you all know, Flink has a layered architecture with multiple
> alternatives for certain levels.
> Exampels are:
> - Programming APIs: Java, Scala, (and Python in progress)
> - Processing Backends: distributed runtime (former Nephele), Java
> Collections, (and potentially Tez in the future)
>
> The challenge with multiple alternatives that serve the same purpuse is
> that these should be in sync.
> A feature that is added to the Java API should also be added to the Scala
> API (and other APIs in the future). The same applies to new runtime
> strategies and operators, such as outer joins.
>
> I think we need a policy how to keep the features of different layer
> alternatives in sync.
> With the recent update of the Scala API, a ScalaAPICompletenessTest was
> added that checks whether the Scala API offers the same methods as the Java
> API. Adding a feature to the Java API breaks the build and requires to
> either adapt the Scala API as well or exclude the added methods from the
> APICompletenessTest.
> While this test is a great tool to make sure that that APIs are synced,
> this basically requires that APIs are always synced, i.e., a modification
> of the Java API must go with an equivalent change of the Scala API.
> If we make this a tight policy and force compatibility at all times,
> contributors must know about several different technologies (Scala Compiler
> Macros, Python, the implementation details of multiple runtime backends,
> ...). This sounds like a huge entrance barrier to me.
>
> To make it clear, I am definitely in favor of keeping APIs and backends in
> sync.
> However, I propose to enforce this only for releases, i.e., allow
> out-of-sync APIs on the master branch and fix the APIs for releases.
> With this additional requirement, we also need to think twice which
> features to add as multiple components of the system will be affected.
>
> What do you guys think?

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Policy on keeping layer alternatives in sync

Robert Metzger
Hi,

I'm also in favor of having a strict policy regarding the Java and Scala
API.
In my understanding is the new Scala API a thin layer above the Java one,
so adding new methods should be straightforward (given that there are
plenty of examples as a reference).

Robert

On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <[hidden email]> wrote:

> Hey Fabian,
>
> thanks for bringing this up.
>
> I would vote to have a hard policy regarding the Scala and Java API as
> these are our main user facing APIs.
>
> If there was a fundamental problem or language feature, which could not be
> supported/ported in/to the other API, I would be OK if it was only
> available in one. But small additions to the APIs like outer joins, which
> can be in sync should also be in sync.
>
> If someone does not want to add the corresponding feature to the other
> APIs, I would go for a pull request with a request for someone else to port
> the missing part it.
>
> I think it is very important for users to be able to assume that all APIs
> have the same "power". Otherwise we might end up in a situation (and I
> think we already had it with the broadcast variables for a time), where
> users have to pick the API, which matches their use case and not their
> preference.
>
> Best,
>
> Ufuk
>
> On 26 Sep 2014, at 10:43, Fabian Hueske <[hidden email]> wrote:
>
> > Hi,
> >
> > as you all know, Flink has a layered architecture with multiple
> > alternatives for certain levels.
> > Exampels are:
> > - Programming APIs: Java, Scala, (and Python in progress)
> > - Processing Backends: distributed runtime (former Nephele), Java
> > Collections, (and potentially Tez in the future)
> >
> > The challenge with multiple alternatives that serve the same purpuse is
> > that these should be in sync.
> > A feature that is added to the Java API should also be added to the Scala
> > API (and other APIs in the future). The same applies to new runtime
> > strategies and operators, such as outer joins.
> >
> > I think we need a policy how to keep the features of different layer
> > alternatives in sync.
> > With the recent update of the Scala API, a ScalaAPICompletenessTest was
> > added that checks whether the Scala API offers the same methods as the
> Java
> > API. Adding a feature to the Java API breaks the build and requires to
> > either adapt the Scala API as well or exclude the added methods from the
> > APICompletenessTest.
> > While this test is a great tool to make sure that that APIs are synced,
> > this basically requires that APIs are always synced, i.e., a modification
> > of the Java API must go with an equivalent change of the Scala API.
> > If we make this a tight policy and force compatibility at all times,
> > contributors must know about several different technologies (Scala
> Compiler
> > Macros, Python, the implementation details of multiple runtime backends,
> > ...). This sounds like a huge entrance barrier to me.
> >
> > To make it clear, I am definitely in favor of keeping APIs and backends
> in
> > sync.
> > However, I propose to enforce this only for releases, i.e., allow
> > out-of-sync APIs on the master branch and fix the APIs for releases.
> > With this additional requirement, we also need to think twice which
> > features to add as multiple components of the system will be affected.
> >
> > What do you guys think?
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Policy on keeping layer alternatives in sync

Kostas Tzoumas-2
If we allow out-of-sync APIs (and backends) until the time of a release,
aren't we just postponing the syncing problem to the time of the release,
which is a pretty bad time to have such a problem?


On Fri, Sep 26, 2014 at 8:49 PM, Robert Metzger <[hidden email]> wrote:

> Hi,
>
> I'm also in favor of having a strict policy regarding the Java and Scala
> API.
> In my understanding is the new Scala API a thin layer above the Java one,
> so adding new methods should be straightforward (given that there are
> plenty of examples as a reference).
>
> Robert
>
> On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <[hidden email]> wrote:
>
> > Hey Fabian,
> >
> > thanks for bringing this up.
> >
> > I would vote to have a hard policy regarding the Scala and Java API as
> > these are our main user facing APIs.
> >
> > If there was a fundamental problem or language feature, which could not
> be
> > supported/ported in/to the other API, I would be OK if it was only
> > available in one. But small additions to the APIs like outer joins, which
> > can be in sync should also be in sync.
> >
> > If someone does not want to add the corresponding feature to the other
> > APIs, I would go for a pull request with a request for someone else to
> port
> > the missing part it.
> >
> > I think it is very important for users to be able to assume that all APIs
> > have the same "power". Otherwise we might end up in a situation (and I
> > think we already had it with the broadcast variables for a time), where
> > users have to pick the API, which matches their use case and not their
> > preference.
> >
> > Best,
> >
> > Ufuk
> >
> > On 26 Sep 2014, at 10:43, Fabian Hueske <[hidden email]> wrote:
> >
> > > Hi,
> > >
> > > as you all know, Flink has a layered architecture with multiple
> > > alternatives for certain levels.
> > > Exampels are:
> > > - Programming APIs: Java, Scala, (and Python in progress)
> > > - Processing Backends: distributed runtime (former Nephele), Java
> > > Collections, (and potentially Tez in the future)
> > >
> > > The challenge with multiple alternatives that serve the same purpuse is
> > > that these should be in sync.
> > > A feature that is added to the Java API should also be added to the
> Scala
> > > API (and other APIs in the future). The same applies to new runtime
> > > strategies and operators, such as outer joins.
> > >
> > > I think we need a policy how to keep the features of different layer
> > > alternatives in sync.
> > > With the recent update of the Scala API, a ScalaAPICompletenessTest was
> > > added that checks whether the Scala API offers the same methods as the
> > Java
> > > API. Adding a feature to the Java API breaks the build and requires to
> > > either adapt the Scala API as well or exclude the added methods from
> the
> > > APICompletenessTest.
> > > While this test is a great tool to make sure that that APIs are synced,
> > > this basically requires that APIs are always synced, i.e., a
> modification
> > > of the Java API must go with an equivalent change of the Scala API.
> > > If we make this a tight policy and force compatibility at all times,
> > > contributors must know about several different technologies (Scala
> > Compiler
> > > Macros, Python, the implementation details of multiple runtime
> backends,
> > > ...). This sounds like a huge entrance barrier to me.
> > >
> > > To make it clear, I am definitely in favor of keeping APIs and backends
> > in
> > > sync.
> > > However, I propose to enforce this only for releases, i.e., allow
> > > out-of-sync APIs on the master branch and fix the APIs for releases.
> > > With this additional requirement, we also need to think twice which
> > > features to add as multiple components of the system will be affected.
> > >
> > > What do you guys think?
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Policy on keeping layer alternatives in sync

Chesnay Schepler
I agree with Kostas, and believe that postponing will imo straight up
not work since people tend to be *very* busy close to a release, even
without having to port features to several APIs.

I furthermore don't think we will get anywhere by creating one policy to
rule them all (especially a rigid one), because there are fundamental
differences between a) the APIs b) scope of a feature; and there not
being a point in setting up a policy when it is very likely that we wont
abide by it.

With the increasing number of API's it's quite a tall order expecting a
version for each of them from a single contributor. Even know that would
be 3 (Java, Scala, Streaming(?)) with 2 more to come in the somewhat
near future (Python, SQL (not sure if relevant)). It is a *massive
*entry barrier, as well as a major time investment on the contributors
part. This should also hold for simple features (certainly at the
beginning).

If (and only if) Scala is as thin as i am made to believe i would be for
a hard policy here. I would exclude other API`s from this. The overhead
from getting to know all API's and debugging unfamiliar code would eat
up way to much time, which could easily break our neck. It's not just
about syncing the API's, but doing so in an efficient manner. For them I
would much rather have 2-3 people per API that are somewhat responsible
for porting these features, preferably in a more concentrated effort
(aka batches).

On 27.9.2014 21:03, Kostas Tzoumas wrote:

> If we allow out-of-sync APIs (and backends) until the time of a release,
> aren't we just postponing the syncing problem to the time of the release,
> which is a pretty bad time to have such a problem?
>
>
> On Fri, Sep 26, 2014 at 8:49 PM, Robert Metzger <[hidden email]> wrote:
>
>> Hi,
>>
>> I'm also in favor of having a strict policy regarding the Java and Scala
>> API.
>> In my understanding is the new Scala API a thin layer above the Java one,
>> so adding new methods should be straightforward (given that there are
>> plenty of examples as a reference).
>>
>> Robert
>>
>> On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <[hidden email]> wrote:
>>
>>> Hey Fabian,
>>>
>>> thanks for bringing this up.
>>>
>>> I would vote to have a hard policy regarding the Scala and Java API as
>>> these are our main user facing APIs.
>>>
>>> If there was a fundamental problem or language feature, which could not
>> be
>>> supported/ported in/to the other API, I would be OK if it was only
>>> available in one. But small additions to the APIs like outer joins, which
>>> can be in sync should also be in sync.
>>>
>>> If someone does not want to add the corresponding feature to the other
>>> APIs, I would go for a pull request with a request for someone else to
>> port
>>> the missing part it.
>>>
>>> I think it is very important for users to be able to assume that all APIs
>>> have the same "power". Otherwise we might end up in a situation (and I
>>> think we already had it with the broadcast variables for a time), where
>>> users have to pick the API, which matches their use case and not their
>>> preference.
>>>
>>> Best,
>>>
>>> Ufuk
>>>
>>> On 26 Sep 2014, at 10:43, Fabian Hueske <[hidden email]> wrote:
>>>
>>>> Hi,
>>>>
>>>> as you all know, Flink has a layered architecture with multiple
>>>> alternatives for certain levels.
>>>> Exampels are:
>>>> - Programming APIs: Java, Scala, (and Python in progress)
>>>> - Processing Backends: distributed runtime (former Nephele), Java
>>>> Collections, (and potentially Tez in the future)
>>>>
>>>> The challenge with multiple alternatives that serve the same purpuse is
>>>> that these should be in sync.
>>>> A feature that is added to the Java API should also be added to the
>> Scala
>>>> API (and other APIs in the future). The same applies to new runtime
>>>> strategies and operators, such as outer joins.
>>>>
>>>> I think we need a policy how to keep the features of different layer
>>>> alternatives in sync.
>>>> With the recent update of the Scala API, a ScalaAPICompletenessTest was
>>>> added that checks whether the Scala API offers the same methods as the
>>> Java
>>>> API. Adding a feature to the Java API breaks the build and requires to
>>>> either adapt the Scala API as well or exclude the added methods from
>> the
>>>> APICompletenessTest.
>>>> While this test is a great tool to make sure that that APIs are synced,
>>>> this basically requires that APIs are always synced, i.e., a
>> modification
>>>> of the Java API must go with an equivalent change of the Scala API.
>>>> If we make this a tight policy and force compatibility at all times,
>>>> contributors must know about several different technologies (Scala
>>> Compiler
>>>> Macros, Python, the implementation details of multiple runtime
>> backends,
>>>> ...). This sounds like a huge entrance barrier to me.
>>>>
>>>> To make it clear, I am definitely in favor of keeping APIs and backends
>>> in
>>>> sync.
>>>> However, I propose to enforce this only for releases, i.e., allow
>>>> out-of-sync APIs on the master branch and fix the APIs for releases.
>>>> With this additional requirement, we also need to think twice which
>>>> features to add as multiple components of the system will be affected.
>>>>
>>>> What do you guys think?
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Policy on keeping layer alternatives in sync

Aljoscha Krettek-2
We could use blocking issues on Jira to mark things that need to be
resolved before a release.

On Sat, Sep 27, 2014 at 11:53 PM, Chesnay Schepler <
[hidden email]> wrote:

> I agree with Kostas, and believe that postponing will imo straight up not
> work since people tend to be *very* busy close to a release, even without
> having to port features to several APIs.
>
> I furthermore don't think we will get anywhere by creating one policy to
> rule them all (especially a rigid one), because there are fundamental
> differences between a) the APIs b) scope of a feature; and there not being
> a point in setting up a policy when it is very likely that we wont abide by
> it.
>
> With the increasing number of API's it's quite a tall order expecting a
> version for each of them from a single contributor. Even know that would be
> 3 (Java, Scala, Streaming(?)) with 2 more to come in the somewhat near
> future (Python, SQL (not sure if relevant)). It is a *massive *entry
> barrier, as well as a major time investment on the contributors part. This
> should also hold for simple features (certainly at the beginning).
>
> If (and only if) Scala is as thin as i am made to believe i would be for a
> hard policy here. I would exclude other API`s from this. The overhead from
> getting to know all API's and debugging unfamiliar code would eat up way to
> much time, which could easily break our neck. It's not just about syncing
> the API's, but doing so in an efficient manner. For them I would much
> rather have 2-3 people per API that are somewhat responsible for porting
> these features, preferably in a more concentrated effort (aka batches).
>
>
> On 27.9.2014 21:03, Kostas Tzoumas wrote:
>
>> If we allow out-of-sync APIs (and backends) until the time of a release,
>> aren't we just postponing the syncing problem to the time of the release,
>> which is a pretty bad time to have such a problem?
>>
>>
>> On Fri, Sep 26, 2014 at 8:49 PM, Robert Metzger <[hidden email]>
>> wrote:
>>
>>  Hi,
>>>
>>> I'm also in favor of having a strict policy regarding the Java and Scala
>>> API.
>>> In my understanding is the new Scala API a thin layer above the Java one,
>>> so adding new methods should be straightforward (given that there are
>>> plenty of examples as a reference).
>>>
>>> Robert
>>>
>>> On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <[hidden email]> wrote:
>>>
>>>  Hey Fabian,
>>>>
>>>> thanks for bringing this up.
>>>>
>>>> I would vote to have a hard policy regarding the Scala and Java API as
>>>> these are our main user facing APIs.
>>>>
>>>> If there was a fundamental problem or language feature, which could not
>>>>
>>> be
>>>
>>>> supported/ported in/to the other API, I would be OK if it was only
>>>> available in one. But small additions to the APIs like outer joins,
>>>> which
>>>> can be in sync should also be in sync.
>>>>
>>>> If someone does not want to add the corresponding feature to the other
>>>> APIs, I would go for a pull request with a request for someone else to
>>>>
>>> port
>>>
>>>> the missing part it.
>>>>
>>>> I think it is very important for users to be able to assume that all
>>>> APIs
>>>> have the same "power". Otherwise we might end up in a situation (and I
>>>> think we already had it with the broadcast variables for a time), where
>>>> users have to pick the API, which matches their use case and not their
>>>> preference.
>>>>
>>>> Best,
>>>>
>>>> Ufuk
>>>>
>>>> On 26 Sep 2014, at 10:43, Fabian Hueske <[hidden email]> wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> as you all know, Flink has a layered architecture with multiple
>>>>> alternatives for certain levels.
>>>>> Exampels are:
>>>>> - Programming APIs: Java, Scala, (and Python in progress)
>>>>> - Processing Backends: distributed runtime (former Nephele), Java
>>>>> Collections, (and potentially Tez in the future)
>>>>>
>>>>> The challenge with multiple alternatives that serve the same purpuse is
>>>>> that these should be in sync.
>>>>> A feature that is added to the Java API should also be added to the
>>>>>
>>>> Scala
>>>
>>>> API (and other APIs in the future). The same applies to new runtime
>>>>> strategies and operators, such as outer joins.
>>>>>
>>>>> I think we need a policy how to keep the features of different layer
>>>>> alternatives in sync.
>>>>> With the recent update of the Scala API, a ScalaAPICompletenessTest was
>>>>> added that checks whether the Scala API offers the same methods as the
>>>>>
>>>> Java
>>>>
>>>>> API. Adding a feature to the Java API breaks the build and requires to
>>>>> either adapt the Scala API as well or exclude the added methods from
>>>>>
>>>> the
>>>
>>>> APICompletenessTest.
>>>>> While this test is a great tool to make sure that that APIs are synced,
>>>>> this basically requires that APIs are always synced, i.e., a
>>>>>
>>>> modification
>>>
>>>> of the Java API must go with an equivalent change of the Scala API.
>>>>> If we make this a tight policy and force compatibility at all times,
>>>>> contributors must know about several different technologies (Scala
>>>>>
>>>> Compiler
>>>>
>>>>> Macros, Python, the implementation details of multiple runtime
>>>>>
>>>> backends,
>>>
>>>> ...). This sounds like a huge entrance barrier to me.
>>>>>
>>>>> To make it clear, I am definitely in favor of keeping APIs and backends
>>>>>
>>>> in
>>>>
>>>>> sync.
>>>>> However, I propose to enforce this only for releases, i.e., allow
>>>>> out-of-sync APIs on the master branch and fix the APIs for releases.
>>>>> With this additional requirement, we also need to think twice which
>>>>> features to add as multiple components of the system will be affected.
>>>>>
>>>>> What do you guys think?
>>>>>
>>>>
>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Policy on keeping layer alternatives in sync

Fabian Hueske
I like the idea of having a single PR for a features that touches different
components (APIs, backends) and have multiple people contributing to it to
make it work for all alternatives.
This would ensure a synced code base, but it will take much more time to
get new features in. This might be a problem if a feature is required for
other features or asked for by some users.

I am not sure if the argument of increased workload towards a release is
true.
If the a feature should go into a release, it must be implemented for all
APIs anyway. Maybe the chance that this is done at the end of a release
cycle is even higher, if the feature is lingereing around in a PR and being
available for a subset of the APIs. But who knows...

Chesnay does also have a point here. We might want to distinguish between
first-class APIs (backends) which are always in sync and others which might
be a bit behind...



2014-09-29 9:56 GMT+02:00 Aljoscha Krettek <[hidden email]>:

> We could use blocking issues on Jira to mark things that need to be
> resolved before a release.
>
> On Sat, Sep 27, 2014 at 11:53 PM, Chesnay Schepler <
> [hidden email]> wrote:
>
> > I agree with Kostas, and believe that postponing will imo straight up not
> > work since people tend to be *very* busy close to a release, even without
> > having to port features to several APIs.
> >
> > I furthermore don't think we will get anywhere by creating one policy to
> > rule them all (especially a rigid one), because there are fundamental
> > differences between a) the APIs b) scope of a feature; and there not
> being
> > a point in setting up a policy when it is very likely that we wont abide
> by
> > it.
> >
> > With the increasing number of API's it's quite a tall order expecting a
> > version for each of them from a single contributor. Even know that would
> be
> > 3 (Java, Scala, Streaming(?)) with 2 more to come in the somewhat near
> > future (Python, SQL (not sure if relevant)). It is a *massive *entry
> > barrier, as well as a major time investment on the contributors part.
> This
> > should also hold for simple features (certainly at the beginning).
> >
> > If (and only if) Scala is as thin as i am made to believe i would be for
> a
> > hard policy here. I would exclude other API`s from this. The overhead
> from
> > getting to know all API's and debugging unfamiliar code would eat up way
> to
> > much time, which could easily break our neck. It's not just about syncing
> > the API's, but doing so in an efficient manner. For them I would much
> > rather have 2-3 people per API that are somewhat responsible for porting
> > these features, preferably in a more concentrated effort (aka batches).
> >
> >
> > On 27.9.2014 21:03, Kostas Tzoumas wrote:
> >
> >> If we allow out-of-sync APIs (and backends) until the time of a release,
> >> aren't we just postponing the syncing problem to the time of the
> release,
> >> which is a pretty bad time to have such a problem?
> >>
> >>
> >> On Fri, Sep 26, 2014 at 8:49 PM, Robert Metzger <[hidden email]>
> >> wrote:
> >>
> >>  Hi,
> >>>
> >>> I'm also in favor of having a strict policy regarding the Java and
> Scala
> >>> API.
> >>> In my understanding is the new Scala API a thin layer above the Java
> one,
> >>> so adding new methods should be straightforward (given that there are
> >>> plenty of examples as a reference).
> >>>
> >>> Robert
> >>>
> >>> On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <[hidden email]> wrote:
> >>>
> >>>  Hey Fabian,
> >>>>
> >>>> thanks for bringing this up.
> >>>>
> >>>> I would vote to have a hard policy regarding the Scala and Java API as
> >>>> these are our main user facing APIs.
> >>>>
> >>>> If there was a fundamental problem or language feature, which could
> not
> >>>>
> >>> be
> >>>
> >>>> supported/ported in/to the other API, I would be OK if it was only
> >>>> available in one. But small additions to the APIs like outer joins,
> >>>> which
> >>>> can be in sync should also be in sync.
> >>>>
> >>>> If someone does not want to add the corresponding feature to the other
> >>>> APIs, I would go for a pull request with a request for someone else to
> >>>>
> >>> port
> >>>
> >>>> the missing part it.
> >>>>
> >>>> I think it is very important for users to be able to assume that all
> >>>> APIs
> >>>> have the same "power". Otherwise we might end up in a situation (and I
> >>>> think we already had it with the broadcast variables for a time),
> where
> >>>> users have to pick the API, which matches their use case and not their
> >>>> preference.
> >>>>
> >>>> Best,
> >>>>
> >>>> Ufuk
> >>>>
> >>>> On 26 Sep 2014, at 10:43, Fabian Hueske <[hidden email]> wrote:
> >>>>
> >>>>  Hi,
> >>>>>
> >>>>> as you all know, Flink has a layered architecture with multiple
> >>>>> alternatives for certain levels.
> >>>>> Exampels are:
> >>>>> - Programming APIs: Java, Scala, (and Python in progress)
> >>>>> - Processing Backends: distributed runtime (former Nephele), Java
> >>>>> Collections, (and potentially Tez in the future)
> >>>>>
> >>>>> The challenge with multiple alternatives that serve the same purpuse
> is
> >>>>> that these should be in sync.
> >>>>> A feature that is added to the Java API should also be added to the
> >>>>>
> >>>> Scala
> >>>
> >>>> API (and other APIs in the future). The same applies to new runtime
> >>>>> strategies and operators, such as outer joins.
> >>>>>
> >>>>> I think we need a policy how to keep the features of different layer
> >>>>> alternatives in sync.
> >>>>> With the recent update of the Scala API, a ScalaAPICompletenessTest
> was
> >>>>> added that checks whether the Scala API offers the same methods as
> the
> >>>>>
> >>>> Java
> >>>>
> >>>>> API. Adding a feature to the Java API breaks the build and requires
> to
> >>>>> either adapt the Scala API as well or exclude the added methods from
> >>>>>
> >>>> the
> >>>
> >>>> APICompletenessTest.
> >>>>> While this test is a great tool to make sure that that APIs are
> synced,
> >>>>> this basically requires that APIs are always synced, i.e., a
> >>>>>
> >>>> modification
> >>>
> >>>> of the Java API must go with an equivalent change of the Scala API.
> >>>>> If we make this a tight policy and force compatibility at all times,
> >>>>> contributors must know about several different technologies (Scala
> >>>>>
> >>>> Compiler
> >>>>
> >>>>> Macros, Python, the implementation details of multiple runtime
> >>>>>
> >>>> backends,
> >>>
> >>>> ...). This sounds like a huge entrance barrier to me.
> >>>>>
> >>>>> To make it clear, I am definitely in favor of keeping APIs and
> backends
> >>>>>
> >>>> in
> >>>>
> >>>>> sync.
> >>>>> However, I propose to enforce this only for releases, i.e., allow
> >>>>> out-of-sync APIs on the master branch and fix the APIs for releases.
> >>>>> With this additional requirement, we also need to think twice which
> >>>>> features to add as multiple components of the system will be
> affected.
> >>>>>
> >>>>> What do you guys think?
> >>>>>
> >>>>
> >>>>
> >
>