---------- Forwarded message ----------
From: Francesco Viscomi <[EMAIL PROTECTED]>
Date: 2017-09-13 18:37 GMT+02:00
Subject: possible a bug?
To: [EMAIL PROTECTED]
Hi all,

I'm trying to extract a content from a web page, and i find the following
example on internet :

=======START CODE======

String url = "http://www.bbc.com/news/uk-england-41255962";

        URL _url = new URL(url);
InputStream input = _url.openStream();

                        LinkContentHandler linkHandler = new
LinkContentHandler();

                        ContentHandler textHandler = new
BodyContentHandler();

                        ToHTMLContentHandler toHTMLHandler = new
ToHTMLContentHandler();

                        TeeContentHandler teeHandler = new
TeeContentHandler(linkHandler, textHandler, toHTMLHandler);

                        Metadata metadata = new Metadata();

                        ParseContext parseContext = new ParseContext();

                        HtmlParser parser = new HtmlParser();

                        parser.parse(input, teeHandler, metadata,
parseContext);

                        content = (StringEscapeUtils.escapeHtml(
textHandler.toString()));

                        System.out.println("il contenuto   "+content);
=======END CODE========
But the output is useless, as i
===============START OUTPUT==================

 Accessibility links

         Skip to content

        Accessibility Help

      BBC iD

        Notifications

    BBC navigation

          Home

        Home

        News

        News

        Sport

        Weather

        Shop
==============END PART OF OUTPUT=============

How i can understand why this happen, and also how can solve it (for some
other web page, for example http://www.vogella.com/tutorials/
AndroidTestingEspresso/article.html) it work right good;
can please help me???
thanks really much

--
Ing. Viscomi Francesco

--
Ing. Viscomi Francesco
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB