Page MenuHomeDevCentral

Allow to download with wget
ClosedPublic

Authored by dereckson on Feb 29 2016, 04:18.
Referenced Files
F3762413: D307.id785.diff
Thu, Nov 21, 09:28
F3762412: D307.id782.diff
Thu, Nov 21, 09:28
F3762411: D307.id724.diff
Thu, Nov 21, 09:28
F3762410: D307.id719.diff
Thu, Nov 21, 09:28
Unknown Object (File)
Thu, Nov 21, 08:34
Unknown Object (File)
Thu, Nov 21, 07:39
Unknown Object (File)
Sat, Nov 16, 01:42
Unknown Object (File)
Fri, Nov 15, 01:36
Subscribers
None

Details

Summary

There is an HTTP redirection loop when file_get_contents, curl or
Guzzle is used to fetch a page of Le Soir website, but not with wget.

We so implement a fallback solution to download with this software.

Test Plan

Diff Detail

Repository
rSTG Source templates generator
Lint
No Lint Coverage
Unit
No Test Coverage
Branch
lesoir-wget
Build Status
Buildable 398
Build 479: arc lint + arc unit

Event Timeline

dereckson retitled this revision from to Allow to download with wget.
dereckson updated this object.
dereckson edited the test plan for this revision. (Show Details)
dereckson added a reviewer: xcombelle.
pages/DownloadWithWget.php
23

Is that safe?

33

This overloads Page::get_data, and so must currently be named like this.

A further change will rename every method to CamelCase.

Regression This commit introduces the following issue when the page IS NOT downloaded through this new method:

Warning: array_key_exists() expects parameter 2 to be array, null given in /usr/home/dereckson/dev/nasqueron/tools/3rdparty/source-templates-generator/page.php on line 237

The Page::meta_tags member isn't set as Page::analyse isn't called anymore.

Thanks to Scoopfinder to have noticed the issue.

page.php
124

Add $data = $this->data;.

Fixed issue reported by Scoopfinder.

dereckson added a reviewer: dereckson.
dereckson marked 2 inline comments as done.

@xcombelle confirmed on #wikipedia-fr the code is safe as far as security is concerned.

They also noted curl works with a wget user-agent, so maybe Le Soir has a whitelist: once a request is correctly done, further requests aren't filtered in one of the step.

We should maybe contact Le Soir operations team to inquire about this issue.

This revision is now accepted and ready to land.Mar 7 2016, 20:22
This revision was automatically updated to reflect the committed changes.