[DISCUSS] FLIP-22: Eager State Declaration

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-22: Eager State Declaration

Tzu-Li (Gordon) Tai
Hi Flink devs!

I would like to propose the following FLIP - Eager State Declaration for Flink managed state: https://cwiki.apache.org/confluence/display/FLINK/FLIP-22%3A+Eager+State+Declaration.
The proposal is a result of some offline discussions with Aljoscha Krettek, Stephan Ewen, and Stefan Richter.

With how the current managed state declaration interfaces work, users may declare state lazily while jobs are running.
This behavior is a direct blocker for several state management features we wish to make a reality in the future.
I also see it as an opportunity to make the interfaces for keyed / operator managed state declarations more unified at the API level, as well as improved user experience for general use cases.

The most important part of the required changes is the deprecation of existing APIs and introducing new state declaration interfaces.
Since this would be a rework of the state interfaces, it would be great to hear thoughts on this and make sure that the proposal is what we want in the long run!

Happy to hear feedback on this :)

Cheers,
Gordon
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-22: Eager State Declaration

Chesnay Schepler-3
Could you add an example to the FLIP for how a user can register a state
with the methods in the RichFunction interface?
Currently it only contains an example for the annotation option.

These methods look like they are called by the user, but that doesn't
really make sense to me as after all the user has to
implement them.

To me a more intuitive signature would be

|void registerKeyedState(StateDescriptorRegistry registry);|

that is called by the system when a UDF is provided by a user who then
registers all the state descriptors he has.

On 04.07.2017 20:00, Tzu-Li (Gordon) Tai wrote:

> Hi Flink devs!
>
> I would like to propose the following FLIP - Eager State Declaration for Flink managed state: https://cwiki.apache.org/confluence/display/FLINK/FLIP-22%3A+Eager+State+Declaration.
> The proposal is a result of some offline discussions with Aljoscha Krettek, Stephan Ewen, and Stefan Richter.
>
> With how the current managed state declaration interfaces work, users may declare state lazily while jobs are running.
> This behavior is a direct blocker for several state management features we wish to make a reality in the future.
> I also see it as an opportunity to make the interfaces for keyed / operator managed state declarations more unified at the API level, as well as improved user experience for general use cases.
>
> The most important part of the required changes is the deprecation of existing APIs and introducing new state declaration interfaces.
> Since this would be a rework of the state interfaces, it would be great to hear thoughts on this and make sure that the proposal is what we want in the long run!
>
> Happy to hear feedback on this :)
>
> Cheers,
> Gordon


Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-22: Eager State Declaration

wenlong.lwl
Hi, all, we have jobs which create state according to type of the key and a
dynamic configuration:

eg: key_type_1's aggregation function is average, while key_type_2's is sum

we need to create state dynamically because the aggregation function may
change in runtime and different aggregation function may need different
state to persistent state. It is really hard to declare state eagerly.

In the flip, I think the main concern to propose the eager declaration of
state is to make sure when restoring  we can have all states registered.
how about just persisting state descriptor in state handle and
automatically register states in restoring?

On 5 July 2017 at 03:53, Chesnay Schepler <[hidden email]> wrote:

> Could you add an example to the FLIP for how a user can register a state
> with the methods in the RichFunction interface?
> Currently it only contains an example for the annotation option.
>
> These methods look like they are called by the user, but that doesn't
> really make sense to me as after all the user has to
> implement them.
>
> To me a more intuitive signature would be
>
> |void registerKeyedState(StateDescriptorRegistry registry);|
>
> that is called by the system when a UDF is provided by a user who then
> registers all the state descriptors he has.
>
>
> On 04.07.2017 20:00, Tzu-Li (Gordon) Tai wrote:
>
>> Hi Flink devs!
>>
>> I would like to propose the following FLIP - Eager State Declaration for
>> Flink managed state: https://cwiki.apache.org/confl
>> uence/display/FLINK/FLIP-22%3A+Eager+State+Declaration.
>> The proposal is a result of some offline discussions with Aljoscha
>> Krettek, Stephan Ewen, and Stefan Richter.
>>
>> With how the current managed state declaration interfaces work, users may
>> declare state lazily while jobs are running.
>> This behavior is a direct blocker for several state management features
>> we wish to make a reality in the future.
>> I also see it as an opportunity to make the interfaces for keyed /
>> operator managed state declarations more unified at the API level, as well
>> as improved user experience for general use cases.
>>
>> The most important part of the required changes is the deprecation of
>> existing APIs and introducing new state declaration interfaces.
>> Since this would be a rework of the state interfaces, it would be great
>> to hear thoughts on this and make sure that the proposal is what we want in
>> the long run!
>>
>> Happy to hear feedback on this :)
>>
>> Cheers,
>> Gordon
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-22: Eager State Declaration

SHI Xiaogang
In reply to this post by Chesnay Schepler-3
Hi Tzu-Li,

Thanks for the proposal. The changes are great. I have several questions
about some details.

First, do you have any plan to provide a method to remove states? Now
states can only be created (either lazily or eagerly), but cannot be
removed. We cannot remove those states not registered because they may be
accessed later (with those deprecated methods).

Second, what about exposing namespaces to users? Now namespaces are only
used in window streams and all user states are in the void namespace. But
some users may come across similar scenarios to window streams where states
are closely related to arrived records and cannot be known beforehand.
Since namespaces are not exposed, they have to create new states when new
records arrive. MapState is another choice, but will be less efficient in
some cases. If we can expose namespaces to users, these users may benefit
from eagerly declared states. I think the change will not break existing
interfaces.

Looking forwards to your comments.

Regards,
Xiaogang





2017-07-05 3:53 GMT+08:00 Chesnay Schepler <[hidden email]>:

> Could you add an example to the FLIP for how a user can register a state
> with the methods in the RichFunction interface?
> Currently it only contains an example for the annotation option.
>
> These methods look like they are called by the user, but that doesn't
> really make sense to me as after all the user has to
> implement them.
>
> To me a more intuitive signature would be
>
> |void registerKeyedState(StateDescriptorRegistry registry);|
>
> that is called by the system when a UDF is provided by a user who then
> registers all the state descriptors he has.
>
>
> On 04.07.2017 20:00, Tzu-Li (Gordon) Tai wrote:
>
>> Hi Flink devs!
>>
>> I would like to propose the following FLIP - Eager State Declaration for
>> Flink managed state: https://cwiki.apache.org/confl
>> uence/display/FLINK/FLIP-22%3A+Eager+State+Declaration.
>> The proposal is a result of some offline discussions with Aljoscha
>> Krettek, Stephan Ewen, and Stefan Richter.
>>
>> With how the current managed state declaration interfaces work, users may
>> declare state lazily while jobs are running.
>> This behavior is a direct blocker for several state management features
>> we wish to make a reality in the future.
>> I also see it as an opportunity to make the interfaces for keyed /
>> operator managed state declarations more unified at the API level, as well
>> as improved user experience for general use cases.
>>
>> The most important part of the required changes is the deprecation of
>> existing APIs and introducing new state declaration interfaces.
>> Since this would be a rework of the state interfaces, it would be great
>> to hear thoughts on this and make sure that the proposal is what we want in
>> the long run!
>>
>> Happy to hear feedback on this :)
>>
>> Cheers,
>> Gordon
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-22: Eager State Declaration

Gyula Fóra
Hi all,

Thanks for the proposal I definitely see the value in making eager
registration as a requirement for several features that you mentioned
(mostly related to state serialization/format etc.).

The only problem I see which has been mentioned by others is that the lack
of lazy (dynamic) state registration might be a blocker for some complex
application logic. In these cases users would have to register more generic
state types at initialization than otherwise necessary making the programs
less efficient.

Maybe it would still make sense to support lazy state registration but not
allow migration of these states in the future? On the other hand I dont
feel very strongly about this as the eager registration should cover most
use-cases with workarounds for the more complex cases.

Gyula

SHI Xiaogang <[hidden email]> ezt írta (időpont: 2017. júl. 5.,
Sze, 4:36):

> Hi Tzu-Li,
>
> Thanks for the proposal. The changes are great. I have several questions
> about some details.
>
> First, do you have any plan to provide a method to remove states? Now
> states can only be created (either lazily or eagerly), but cannot be
> removed. We cannot remove those states not registered because they may be
> accessed later (with those deprecated methods).
>
> Second, what about exposing namespaces to users? Now namespaces are only
> used in window streams and all user states are in the void namespace. But
> some users may come across similar scenarios to window streams where states
> are closely related to arrived records and cannot be known beforehand.
> Since namespaces are not exposed, they have to create new states when new
> records arrive. MapState is another choice, but will be less efficient in
> some cases. If we can expose namespaces to users, these users may benefit
> from eagerly declared states. I think the change will not break existing
> interfaces.
>
> Looking forwards to your comments.
>
> Regards,
> Xiaogang
>
>
>
>
>
> 2017-07-05 3:53 GMT+08:00 Chesnay Schepler <[hidden email]>:
>
> > Could you add an example to the FLIP for how a user can register a state
> > with the methods in the RichFunction interface?
> > Currently it only contains an example for the annotation option.
> >
> > These methods look like they are called by the user, but that doesn't
> > really make sense to me as after all the user has to
> > implement them.
> >
> > To me a more intuitive signature would be
> >
> > |void registerKeyedState(StateDescriptorRegistry registry);|
> >
> > that is called by the system when a UDF is provided by a user who then
> > registers all the state descriptors he has.
> >
> >
> > On 04.07.2017 20:00, Tzu-Li (Gordon) Tai wrote:
> >
> >> Hi Flink devs!
> >>
> >> I would like to propose the following FLIP - Eager State Declaration for
> >> Flink managed state: https://cwiki.apache.org/confl
> >> uence/display/FLINK/FLIP-22%3A+Eager+State+Declaration.
> >> The proposal is a result of some offline discussions with Aljoscha
> >> Krettek, Stephan Ewen, and Stefan Richter.
> >>
> >> With how the current managed state declaration interfaces work, users
> may
> >> declare state lazily while jobs are running.
> >> This behavior is a direct blocker for several state management features
> >> we wish to make a reality in the future.
> >> I also see it as an opportunity to make the interfaces for keyed /
> >> operator managed state declarations more unified at the API level, as
> well
> >> as improved user experience for general use cases.
> >>
> >> The most important part of the required changes is the deprecation of
> >> existing APIs and introducing new state declaration interfaces.
> >> Since this would be a rework of the state interfaces, it would be great
> >> to hear thoughts on this and make sure that the proposal is what we
> want in
> >> the long run!
> >>
> >> Happy to hear feedback on this :)
> >>
> >> Cheers,
> >> Gordon
> >>
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: [DISCUSS] FLIP-22: Eager State Declaration

hzyuemeng1
In reply to this post by wenlong.lwl
pengwenlong?

2017-07-12

hzyuemeng1



发件人:"wenlong.lwl" <[hidden email]>
发送时间:2017-07-05 10:27
主题:Re: [DISCUSS] FLIP-22: Eager State Declaration
收件人:"dev"<[hidden email]>
抄送:

Hi, all, we have jobs which create state according to type of the key and a
dynamic configuration:

eg: key_type_1's aggregation function is average, while key_type_2's is sum

we need to create state dynamically because the aggregation function may
change in runtime and different aggregation function may need different
state to persistent state. It is really hard to declare state eagerly.

In the flip, I think the main concern to propose the eager declaration of
state is to make sure when restoring  we can have all states registered.
how about just persisting state descriptor in state handle and
automatically register states in restoring?

On 5 July 2017 at 03:53, Chesnay Schepler <[hidden email]> wrote:

> Could you add an example to the FLIP for how a user can register a state
> with the methods in the RichFunction interface?
> Currently it only contains an example for the annotation option.
>
> These methods look like they are called by the user, but that doesn't
> really make sense to me as after all the user has to
> implement them.
>
> To me a more intuitive signature would be
>
> |void registerKeyedState(StateDescriptorRegistry registry);|
>
> that is called by the system when a UDF is provided by a user who then
> registers all the state descriptors he has.
>
>
> On 04.07.2017 20:00, Tzu-Li (Gordon) Tai wrote:
>
>> Hi Flink devs!
>>
>> I would like to propose the following FLIP - Eager State Declaration for
>> Flink managed state: https://cwiki.apache.org/confl 
>> uence/display/FLINK/FLIP-22%3A+Eager+State+Declaration.
>> The proposal is a result of some offline discussions with Aljoscha
>> Krettek, Stephan Ewen, and Stefan Richter.
>>
>> With how the current managed state declaration interfaces work, users may
>> declare state lazily while jobs are running.
>> This behavior is a direct blocker for several state management features
>> we wish to make a reality in the future.
>> I also see it as an opportunity to make the interfaces for keyed /
>> operator managed state declarations more unified at the API level, as well
>> as improved user experience for general use cases.
>>
>> The most important part of the required changes is the deprecation of
>> existing APIs and introducing new state declaration interfaces.
>> Since this would be a rework of the state interfaces, it would be great
>> to hear thoughts on this and make sure that the proposal is what we want in
>> the long run!
>>
>> Happy to hear feedback on this :)
>>
>> Cheers,
>> Gordon
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: [DISCUSS] FLIP-22: Eager State Declaration

Aljoscha Krettek-2
Hi,

First of all, I like the idea of eagerly defined state for user functions, this makes for very nice API, as can be seen in the Beam state API. I also agree with previous posters that completely banning lazy state makes certain use cases very tricky to implement, basically putting the burden of lazy state evaluation on the user by forcing them to use a generic MapState where they manually keep track of lazily allocated new state types. (Or some such thing.)

With this out of the way, I have some remarks on the new APIs:

What would be the fate of this style of method for getting state on KeyedStateBackend:

<N, S extends State> S getPartitionedState(
      N namespace,
      TypeSerializer<N> namespaceSerializer,
      StateDescriptor<S, ?> stateDescriptor) throws Exception;

This is used, for example by the WindowOperator to retrieve state. The issue is that the WindowOperator is not actually aware of the type of state, it just needs to know that it is an AppendingState so that it can put elements into state and retrieve them when firing. This is used for supporting different kinds of windows (with ListState, ReducingState, FoldingState, and AggregatingState) without having extra code in the WindowOperator for that.

How would registering a state that uses a function work with the annotation-based user API, I think this is problematic:

@KeyedState(
    stateId = “reducing-state",
    typeSerializerFactory = MySerializer.class,
    function = ?
)
private ReducingState<MyPojo> reducingState

The reason why this is problematic is that annotations only allow constants as parameters and in Flink users usually specify objects (anonymous inner, or lambda functions) when specifying a user function. Also, we cannot have a generic type here, so there’s no way of using Java type checking for verifying that a user specifies a ReduceFunction<T> for a ReducingState<T>.

As a side note, Beam is getting around this problem by only putting the state name in the annotation and the rest in a state spec, like this:

@StateId(“stateId”)
private final StateSpec<CombiningState<Double, CountSum<Double>, Double>> combiningState =
    StateSpecs.combining(new Mean.CountSumCoder<Double>(), Mean.<Double>of());

What do you think?

Best,
Aljoscha


> On 12. Jul 2017, at 08:37, hzyuemeng1 <[hidden email]> wrote:
>
> pengwenlong?
>
> 2017-07-12
>
> hzyuemeng1
>
>
>
> 发件人:"wenlong.lwl" <[hidden email]>
> 发送时间:2017-07-05 10:27
> 主题:Re: [DISCUSS] FLIP-22: Eager State Declaration
> 收件人:"dev"<[hidden email]>
> 抄送:
>
> Hi, all, we have jobs which create state according to type of the key and a
> dynamic configuration:
>
> eg: key_type_1's aggregation function is average, while key_type_2's is sum
>
> we need to create state dynamically because the aggregation function may
> change in runtime and different aggregation function may need different
> state to persistent state. It is really hard to declare state eagerly.
>
> In the flip, I think the main concern to propose the eager declaration of
> state is to make sure when restoring  we can have all states registered.
> how about just persisting state descriptor in state handle and
> automatically register states in restoring?
>
> On 5 July 2017 at 03:53, Chesnay Schepler <[hidden email]> wrote:
>
>> Could you add an example to the FLIP for how a user can register a state
>> with the methods in the RichFunction interface?
>> Currently it only contains an example for the annotation option.
>>
>> These methods look like they are called by the user, but that doesn't
>> really make sense to me as after all the user has to
>> implement them.
>>
>> To me a more intuitive signature would be
>>
>> |void registerKeyedState(StateDescriptorRegistry registry);|
>>
>> that is called by the system when a UDF is provided by a user who then
>> registers all the state descriptors he has.
>>
>>
>> On 04.07.2017 20:00, Tzu-Li (Gordon) Tai wrote:
>>
>>> Hi Flink devs!
>>>
>>> I would like to propose the following FLIP - Eager State Declaration for
>>> Flink managed state: https://cwiki.apache.org/confl 
>>> uence/display/FLINK/FLIP-22%3A+Eager+State+Declaration.
>>> The proposal is a result of some offline discussions with Aljoscha
>>> Krettek, Stephan Ewen, and Stefan Richter.
>>>
>>> With how the current managed state declaration interfaces work, users may
>>> declare state lazily while jobs are running.
>>> This behavior is a direct blocker for several state management features
>>> we wish to make a reality in the future.
>>> I also see it as an opportunity to make the interfaces for keyed /
>>> operator managed state declarations more unified at the API level, as well
>>> as improved user experience for general use cases.
>>>
>>> The most important part of the required changes is the deprecation of
>>> existing APIs and introducing new state declaration interfaces.
>>> Since this would be a rework of the state interfaces, it would be great
>>> to hear thoughts on this and make sure that the proposal is what we want in
>>> the long run!
>>>
>>> Happy to hear feedback on this :)
>>>
>>> Cheers,
>>> Gordon
>>>
>>
>>