[jira] [Commented] (FLINK-834) Extend writeAsText with custom formatting function.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (FLINK-834) Extend writeAsText with custom formatting function.

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/FLINK-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028351#comment-14028351 ]

Fabian Hueske commented on FLINK-834:

Exactly, it would be syntactic sugar for ingesting a formatting MapFunction<InType, String> before a sink.

> Extend writeAsText with custom formatting function.
> ---------------------------------------------------
>                 Key: FLINK-834
>                 URL: https://issues.apache.org/jira/browse/FLINK-834
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
> Currently, write as text uses the `toString()` method of data types to serialize the output as text. Alternatively, we have a CSV format that writes Tuple Datasets by using the `toString()` methods of the individual fields. Since Tuple's `toString()` method cannot be adapted without extending the class, it is not easily possible to define a custom output format of data sets which include Tuples.
> I think it would be good to have a way to explicitly format a text output.
> We could add a formatting function that returns a String for an input element, such as
> ```
> DataSet<Tuple2<String, MyPojo>> myDS;
> myDS.writeAsFormattedText("hdfs:///myOutPath",
>   new TextFormatter<Tuple2<String, MyPojo>>() {
>     @Override
>     public String format(Tuple2<String, MyPojo> input) {
>       return input.f0+" -> "+
>              input.f1.getWhatEver()+"  and "+
>              input.f1.getSomethingElse();
>     });
> ```
> Internally, we would use the default TextOutputFormat but with a previous Map for formatting.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/834
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, simple-issue, user satisfaction,
> Milestone: Release 0.6 (unplanned)
> Created at: Mon May 19 14:39:51 CEST 2014
> State: open

This message was sent by Atlassian JIRA