[Discussion] Flink Pulsar Connector

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discussion] Flink Pulsar Connector

Sijie Guo-2
Hi Flinkers,

As discussed with @tzulitai at apache/flink#5845
<https://github.com/apache/flink/pull/5845>, I am starting a discussion
thread about contributing flink pulsar connectors (including both source
and sink connectors) from pulsar community to flink project. We'd like to
see what are people's thoughts about this and how we can proceed for this.

For people who doesn't know about Apache Pulsar, here are some background:

---

Apache Pulsar (incubating) <https://pulsar.incubator.apache.org/> is a
distributed pub/sub messaging system, which provides very flexible
messaging model - unifying traditional queuing (e.g. SQS, rabbitmq) and
high-performance streaming (e.g. Kinesis, Kafka) into one pub/sub messaging
model + api. It is backed by a scalable segment/log storage Apache
BookKeeper, which provide unbounded stream storage for Pulsar. Because of
its segment-centric architecture design, Pulsar provides compelling
unbounded streaming data storage. It is good for both streaming and batch
processing, which I believe it fits very well into Flink's data processing
model. Besides that, pulsar has a lot of advanced features going on its
upcoming 2.0 release, including built-in schema registry, topic compaction,
regex subscription, and tiered storage
<https://github.com/apache/incubator-pulsar/wiki/PIP-17:-Tiered-storage-for-Pulsar-topics>
 ...

Pulsar was developed by Yahoo since 2012-ish and has been running on
production for 4+ years, over 10+ data centers and processing/delivering
billions of messages per day. It was open sourced at 2016. Since it is open
sourced, it has been adopted by various companies. Nowadays, the pulsar
slack channel discussion is very active and fast-growing. The community
currently has about 15 committers.

---

I happened to work with ZongYang (who is also a pulsar contributor) on
developing pulsar connectors for flink to satisfy pulsar users requests. We
would like to contribute the connector work to flink and continue the
collaboration between flink and pulsar communities. From pulsar community
perspective, we are also very committed to developing pulsar's ecosystem,
and willing and dedicated to developing/maintaining flink pulsar
connectors.

Hope this email thread give you guys enough background of pulsar and clear
some of the concerns that @tzulitai raised in the jira ticket / pull
request. Looking forward to any feedback from pulsar community and deep
collaboration between flink and pulsar community.

Also /cc pulsar dev mailing list ([hidden email]). If
there are any questions, pulsar devs can also help to answer.

Thanks,
Sijie
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Flink Pulsar Connector

Henry Saputra
Here is the link to Apache Flink JIRA issue for this:

https://issues.apache.org/jira/browse/FLINK-9168

- Henry

On Fri, Apr 20, 2018 at 12:08 AM, Sijie Guo <[hidden email]> wrote:

> Hi Flinkers,
>
> As discussed with @tzulitai at apache/flink#5845
> <https://github.com/apache/flink/pull/5845>, I am starting a discussion
> thread about contributing flink pulsar connectors (including both source
> and sink connectors) from pulsar community to flink project. We'd like to
> see what are people's thoughts about this and how we can proceed for this.
>
> For people who doesn't know about Apache Pulsar, here are some background:
>
> ---
>
> Apache Pulsar (incubating) <https://pulsar.incubator.apache.org/> is a
> distributed pub/sub messaging system, which provides very flexible
> messaging model - unifying traditional queuing (e.g. SQS, rabbitmq) and
> high-performance streaming (e.g. Kinesis, Kafka) into one pub/sub messaging
> model + api. It is backed by a scalable segment/log storage Apache
> BookKeeper, which provide unbounded stream storage for Pulsar. Because of
> its segment-centric architecture design, Pulsar provides compelling
> unbounded streaming data storage. It is good for both streaming and batch
> processing, which I believe it fits very well into Flink's data processing
> model. Besides that, pulsar has a lot of advanced features going on its
> upcoming 2.0 release, including built-in schema registry, topic compaction,
> regex subscription, and tiered storage
> <https://github.com/apache/incubator-pulsar/wiki/PIP-17:-
> Tiered-storage-for-Pulsar-topics>
>  ...
>
> Pulsar was developed by Yahoo since 2012-ish and has been running on
> production for 4+ years, over 10+ data centers and processing/delivering
> billions of messages per day. It was open sourced at 2016. Since it is open
> sourced, it has been adopted by various companies. Nowadays, the pulsar
> slack channel discussion is very active and fast-growing. The community
> currently has about 15 committers.
>
> ---
>
> I happened to work with ZongYang (who is also a pulsar contributor) on
> developing pulsar connectors for flink to satisfy pulsar users requests. We
> would like to contribute the connector work to flink and continue the
> collaboration between flink and pulsar communities. From pulsar community
> perspective, we are also very committed to developing pulsar's ecosystem,
> and willing and dedicated to developing/maintaining flink pulsar
> connectors.
>
> Hope this email thread give you guys enough background of pulsar and clear
> some of the concerns that @tzulitai raised in the jira ticket / pull
> request. Looking forward to any feedback from pulsar community and deep
> collaboration between flink and pulsar community.
>
> Also /cc pulsar dev mailing list ([hidden email]). If
> there are any questions, pulsar devs can also help to answer.
>
> Thanks,
> Sijie
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Flink Pulsar Connector

Fabian Hueske-2
Hi Sijie, hi Pulsar community!

Thanks for the detailed overview of Pulsar.
I like the idea of adding a Pulsar connector to Flink.

As Gordon mentioned, the Flink community would like to ensure that the
connector is maintained after being added.
We experienced that connector maintenance, including fixing bugs, upgrading
to newer versions, adding new features, and reviewing contributions,
results in a lot of work for the community
Since you ensured that the Pulsar community is eager to help with this
effort, I think we could add the connector to Flink.
This would also be a good initiative for a tighter collaboration between
two ASF projects.

Best, Fabian

2018-04-22 7:02 GMT+02:00 Henry Saputra <[hidden email]>:

> Here is the link to Apache Flink JIRA issue for this:
>
> https://issues.apache.org/jira/browse/FLINK-9168
>
> - Henry
>
> On Fri, Apr 20, 2018 at 12:08 AM, Sijie Guo <[hidden email]> wrote:
>
> > Hi Flinkers,
> >
> > As discussed with @tzulitai at apache/flink#5845
> > <https://github.com/apache/flink/pull/5845>, I am starting a discussion
> > thread about contributing flink pulsar connectors (including both source
> > and sink connectors) from pulsar community to flink project. We'd like to
> > see what are people's thoughts about this and how we can proceed for
> this.
> >
> > For people who doesn't know about Apache Pulsar, here are some
> background:
> >
> > ---
> >
> > Apache Pulsar (incubating) <https://pulsar.incubator.apache.org/> is a
> > distributed pub/sub messaging system, which provides very flexible
> > messaging model - unifying traditional queuing (e.g. SQS, rabbitmq) and
> > high-performance streaming (e.g. Kinesis, Kafka) into one pub/sub
> messaging
> > model + api. It is backed by a scalable segment/log storage Apache
> > BookKeeper, which provide unbounded stream storage for Pulsar. Because of
> > its segment-centric architecture design, Pulsar provides compelling
> > unbounded streaming data storage. It is good for both streaming and batch
> > processing, which I believe it fits very well into Flink's data
> processing
> > model. Besides that, pulsar has a lot of advanced features going on its
> > upcoming 2.0 release, including built-in schema registry, topic
> compaction,
> > regex subscription, and tiered storage
> > <https://github.com/apache/incubator-pulsar/wiki/PIP-17:-
> > Tiered-storage-for-Pulsar-topics>
> >  ...
> >
> > Pulsar was developed by Yahoo since 2012-ish and has been running on
> > production for 4+ years, over 10+ data centers and processing/delivering
> > billions of messages per day. It was open sourced at 2016. Since it is
> open
> > sourced, it has been adopted by various companies. Nowadays, the pulsar
> > slack channel discussion is very active and fast-growing. The community
> > currently has about 15 committers.
> >
> > ---
> >
> > I happened to work with ZongYang (who is also a pulsar contributor) on
> > developing pulsar connectors for flink to satisfy pulsar users requests.
> We
> > would like to contribute the connector work to flink and continue the
> > collaboration between flink and pulsar communities. From pulsar community
> > perspective, we are also very committed to developing pulsar's ecosystem,
> > and willing and dedicated to developing/maintaining flink pulsar
> > connectors.
> >
> > Hope this email thread give you guys enough background of pulsar and
> clear
> > some of the concerns that @tzulitai raised in the jira ticket / pull
> > request. Looking forward to any feedback from pulsar community and deep
> > collaboration between flink and pulsar community.
> >
> > Also /cc pulsar dev mailing list ([hidden email]). If
> > there are any questions, pulsar devs can also help to answer.
> >
> > Thanks,
> > Sijie
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Flink Pulsar Connector

Sijie Guo-2
Hi Fabian,

Awesome, glad to hear back from flink community! Hope this is a good
starting point for a tighter collaboration between two communities.

Thank you,
Sijie

On Tue, Apr 24, 2018 at 2:32 AM, Fabian Hueske <[hidden email]> wrote:

> Hi Sijie, hi Pulsar community!
>
> Thanks for the detailed overview of Pulsar.
> I like the idea of adding a Pulsar connector to Flink.
>
> As Gordon mentioned, the Flink community would like to ensure that the
> connector is maintained after being added.
> We experienced that connector maintenance, including fixing bugs, upgrading
> to newer versions, adding new features, and reviewing contributions,
> results in a lot of work for the community
> Since you ensured that the Pulsar community is eager to help with this
> effort, I think we could add the connector to Flink.
> This would also be a good initiative for a tighter collaboration between
> two ASF projects.
>
> Best, Fabian
>
> 2018-04-22 7:02 GMT+02:00 Henry Saputra <[hidden email]>:
>
> > Here is the link to Apache Flink JIRA issue for this:
> >
> > https://issues.apache.org/jira/browse/FLINK-9168
> >
> > - Henry
> >
> > On Fri, Apr 20, 2018 at 12:08 AM, Sijie Guo <[hidden email]> wrote:
> >
> > > Hi Flinkers,
> > >
> > > As discussed with @tzulitai at apache/flink#5845
> > > <https://github.com/apache/flink/pull/5845>, I am starting a
> discussion
> > > thread about contributing flink pulsar connectors (including both
> source
> > > and sink connectors) from pulsar community to flink project. We'd like
> to
> > > see what are people's thoughts about this and how we can proceed for
> > this.
> > >
> > > For people who doesn't know about Apache Pulsar, here are some
> > background:
> > >
> > > ---
> > >
> > > Apache Pulsar (incubating) <https://pulsar.incubator.apache.org/> is a
> > > distributed pub/sub messaging system, which provides very flexible
> > > messaging model - unifying traditional queuing (e.g. SQS, rabbitmq) and
> > > high-performance streaming (e.g. Kinesis, Kafka) into one pub/sub
> > messaging
> > > model + api. It is backed by a scalable segment/log storage Apache
> > > BookKeeper, which provide unbounded stream storage for Pulsar. Because
> of
> > > its segment-centric architecture design, Pulsar provides compelling
> > > unbounded streaming data storage. It is good for both streaming and
> batch
> > > processing, which I believe it fits very well into Flink's data
> > processing
> > > model. Besides that, pulsar has a lot of advanced features going on its
> > > upcoming 2.0 release, including built-in schema registry, topic
> > compaction,
> > > regex subscription, and tiered storage
> > > <https://github.com/apache/incubator-pulsar/wiki/PIP-17:-
> > > Tiered-storage-for-Pulsar-topics>
> > >  ...
> > >
> > > Pulsar was developed by Yahoo since 2012-ish and has been running on
> > > production for 4+ years, over 10+ data centers and
> processing/delivering
> > > billions of messages per day. It was open sourced at 2016. Since it is
> > open
> > > sourced, it has been adopted by various companies. Nowadays, the pulsar
> > > slack channel discussion is very active and fast-growing. The community
> > > currently has about 15 committers.
> > >
> > > ---
> > >
> > > I happened to work with ZongYang (who is also a pulsar contributor) on
> > > developing pulsar connectors for flink to satisfy pulsar users
> requests.
> > We
> > > would like to contribute the connector work to flink and continue the
> > > collaboration between flink and pulsar communities. From pulsar
> community
> > > perspective, we are also very committed to developing pulsar's
> ecosystem,
> > > and willing and dedicated to developing/maintaining flink pulsar
> > > connectors.
> > >
> > > Hope this email thread give you guys enough background of pulsar and
> > clear
> > > some of the concerns that @tzulitai raised in the jira ticket / pull
> > > request. Looking forward to any feedback from pulsar community and deep
> > > collaboration between flink and pulsar community.
> > >
> > > Also /cc pulsar dev mailing list ([hidden email]). If
> > > there are any questions, pulsar devs can also help to answer.
> > >
> > > Thanks,
> > > Sijie
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Flink Pulsar Connector

Tzu-Li (Gordon) Tai
Hi Sijie,

Thanks for initiating the discussion.

Given that it seems promising that we will have good eyes on this connector after it is merged,
+1 to proceed with the PR and reviewing it as a connector contribution to Flink.

Cheers,
Gordon

On 25 April 2018 at 4:07:48 AM, Sijie Guo ([hidden email]) wrote:

Hi Fabian,

Awesome, glad to hear back from flink community! Hope this is a good
starting point for a tighter collaboration between two communities.

Thank you,
Sijie

On Tue, Apr 24, 2018 at 2:32 AM, Fabian Hueske <[hidden email]> wrote:

> Hi Sijie, hi Pulsar community!
>
> Thanks for the detailed overview of Pulsar.
> I like the idea of adding a Pulsar connector to Flink.
>
> As Gordon mentioned, the Flink community would like to ensure that the
> connector is maintained after being added.
> We experienced that connector maintenance, including fixing bugs, upgrading
> to newer versions, adding new features, and reviewing contributions,
> results in a lot of work for the community
> Since you ensured that the Pulsar community is eager to help with this
> effort, I think we could add the connector to Flink.
> This would also be a good initiative for a tighter collaboration between
> two ASF projects.
>
> Best, Fabian
>
> 2018-04-22 7:02 GMT+02:00 Henry Saputra <[hidden email]>:
>
> > Here is the link to Apache Flink JIRA issue for this:
> >
> > https://issues.apache.org/jira/browse/FLINK-9168 
> >
> > - Henry
> >
> > On Fri, Apr 20, 2018 at 12:08 AM, Sijie Guo <[hidden email]> wrote:
> >
> > > Hi Flinkers,
> > >
> > > As discussed with @tzulitai at apache/flink#5845
> > > <https://github.com/apache/flink/pull/5845>, I am starting a
> discussion
> > > thread about contributing flink pulsar connectors (including both
> source
> > > and sink connectors) from pulsar community to flink project. We'd like
> to
> > > see what are people's thoughts about this and how we can proceed for
> > this.
> > >
> > > For people who doesn't know about Apache Pulsar, here are some
> > background:
> > >
> > > ---
> > >
> > > Apache Pulsar (incubating) <https://pulsar.incubator.apache.org/> is a
> > > distributed pub/sub messaging system, which provides very flexible
> > > messaging model - unifying traditional queuing (e.g. SQS, rabbitmq) and
> > > high-performance streaming (e.g. Kinesis, Kafka) into one pub/sub
> > messaging
> > > model + api. It is backed by a scalable segment/log storage Apache
> > > BookKeeper, which provide unbounded stream storage for Pulsar. Because
> of
> > > its segment-centric architecture design, Pulsar provides compelling
> > > unbounded streaming data storage. It is good for both streaming and
> batch
> > > processing, which I believe it fits very well into Flink's data
> > processing
> > > model. Besides that, pulsar has a lot of advanced features going on its
> > > upcoming 2.0 release, including built-in schema registry, topic
> > compaction,
> > > regex subscription, and tiered storage
> > > <https://github.com/apache/incubator-pulsar/wiki/PIP-17:- 
> > > Tiered-storage-for-Pulsar-topics>
> > > ...
> > >
> > > Pulsar was developed by Yahoo since 2012-ish and has been running on
> > > production for 4+ years, over 10+ data centers and
> processing/delivering
> > > billions of messages per day. It was open sourced at 2016. Since it is
> > open
> > > sourced, it has been adopted by various companies. Nowadays, the pulsar
> > > slack channel discussion is very active and fast-growing. The community
> > > currently has about 15 committers.
> > >
> > > ---
> > >
> > > I happened to work with ZongYang (who is also a pulsar contributor) on
> > > developing pulsar connectors for flink to satisfy pulsar users
> requests.
> > We
> > > would like to contribute the connector work to flink and continue the
> > > collaboration between flink and pulsar communities. From pulsar
> community
> > > perspective, we are also very committed to developing pulsar's
> ecosystem,
> > > and willing and dedicated to developing/maintaining flink pulsar
> > > connectors.
> > >
> > > Hope this email thread give you guys enough background of pulsar and
> > clear
> > > some of the concerns that @tzulitai raised in the jira ticket / pull
> > > request. Looking forward to any feedback from pulsar community and deep
> > > collaboration between flink and pulsar community.
> > >
> > > Also /cc pulsar dev mailing list ([hidden email]). If
> > > there are any questions, pulsar devs can also help to answer.
> > >
> > > Thanks,
> > > Sijie
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] Flink Pulsar Connector

Sijie Guo-2
Awesome! Thanks, Gordon!

- Sijie

On Wed, Apr 25, 2018 at 12:02 AM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Hi Sijie,
>
> Thanks for initiating the discussion.
>
> Given that it seems promising that we will have good eyes on this
> connector after it is merged,
> +1 to proceed with the PR and reviewing it as a connector contribution to
> Flink.
>
> Cheers,
> Gordon
>
>
> On 25 April 2018 at 4:07:48 AM, Sijie Guo ([hidden email]) wrote:
>
> Hi Fabian,
>
> Awesome, glad to hear back from flink community! Hope this is a good
> starting point for a tighter collaboration between two communities.
>
> Thank you,
> Sijie
>
> On Tue, Apr 24, 2018 at 2:32 AM, Fabian Hueske <[hidden email]> wrote:
>
> > Hi Sijie, hi Pulsar community!
> >
> > Thanks for the detailed overview of Pulsar.
> > I like the idea of adding a Pulsar connector to Flink.
> >
> > As Gordon mentioned, the Flink community would like to ensure that the
> > connector is maintained after being added.
> > We experienced that connector maintenance, including fixing bugs,
> upgrading
> > to newer versions, adding new features, and reviewing contributions,
> > results in a lot of work for the community
> > Since you ensured that the Pulsar community is eager to help with this
> > effort, I think we could add the connector to Flink.
> > This would also be a good initiative for a tighter collaboration between
> > two ASF projects.
> >
> > Best, Fabian
> >
> > 2018-04-22 7:02 GMT+02:00 Henry Saputra <[hidden email]>:
> >
> > > Here is the link to Apache Flink JIRA issue for this:
> > >
> > > https://issues.apache.org/jira/browse/FLINK-9168
> > >
> > > - Henry
> > >
> > > On Fri, Apr 20, 2018 at 12:08 AM, Sijie Guo <[hidden email]>
> wrote:
> > >
> > > > Hi Flinkers,
> > > >
> > > > As discussed with @tzulitai at apache/flink#5845
> > > > <https://github.com/apache/flink/pull/5845>, I am starting a
> > discussion
> > > > thread about contributing flink pulsar connectors (including both
> > source
> > > > and sink connectors) from pulsar community to flink project. We'd
> like
> > to
> > > > see what are people's thoughts about this and how we can proceed for
> > > this.
> > > >
> > > > For people who doesn't know about Apache Pulsar, here are some
> > > background:
> > > >
> > > > ---
> > > >
> > > > Apache Pulsar (incubating) <https://pulsar.incubator.apache.org/>
> is a
> > > > distributed pub/sub messaging system, which provides very flexible
> > > > messaging model - unifying traditional queuing (e.g. SQS, rabbitmq)
> and
> > > > high-performance streaming (e.g. Kinesis, Kafka) into one pub/sub
> > > messaging
> > > > model + api. It is backed by a scalable segment/log storage Apache
> > > > BookKeeper, which provide unbounded stream storage for Pulsar.
> Because
> > of
> > > > its segment-centric architecture design, Pulsar provides compelling
> > > > unbounded streaming data storage. It is good for both streaming and
> > batch
> > > > processing, which I believe it fits very well into Flink's data
> > > processing
> > > > model. Besides that, pulsar has a lot of advanced features going on
> its
> > > > upcoming 2.0 release, including built-in schema registry, topic
> > > compaction,
> > > > regex subscription, and tiered storage
> > > > <https://github.com/apache/incubator-pulsar/wiki/PIP-17:-
> > > > Tiered-storage-for-Pulsar-topics>
> > > > ...
> > > >
> > > > Pulsar was developed by Yahoo since 2012-ish and has been running on
> > > > production for 4+ years, over 10+ data centers and
> > processing/delivering
> > > > billions of messages per day. It was open sourced at 2016. Since it
> is
> > > open
> > > > sourced, it has been adopted by various companies. Nowadays, the
> pulsar
> > > > slack channel discussion is very active and fast-growing. The
> community
> > > > currently has about 15 committers.
> > > >
> > > > ---
> > > >
> > > > I happened to work with ZongYang (who is also a pulsar contributor)
> on
> > > > developing pulsar connectors for flink to satisfy pulsar users
> > requests.
> > > We
> > > > would like to contribute the connector work to flink and continue
> the
> > > > collaboration between flink and pulsar communities. From pulsar
> > community
> > > > perspective, we are also very committed to developing pulsar's
> > ecosystem,
> > > > and willing and dedicated to developing/maintaining flink pulsar
> > > > connectors.
> > > >
> > > > Hope this email thread give you guys enough background of pulsar and
> > > clear
> > > > some of the concerns that @tzulitai raised in the jira ticket / pull
> > > > request. Looking forward to any feedback from pulsar community and
> deep
> > > > collaboration between flink and pulsar community.
> > > >
> > > > Also /cc pulsar dev mailing list ([hidden email]).
> If
> > > > there are any questions, pulsar devs can also help to answer.
> > > >
> > > > Thanks,
> > > > Sijie
> > > >
> > >
> >
>
>