Page MenuHomeDevCentral

Allow to download with wget
ClosedPublic

Authored by dereckson on Feb 29 2016, 04:18.
Referenced Files
Unknown Object (File)
Tue, Dec 17, 07:24
Unknown Object (File)
Thu, Dec 12, 05:54
Unknown Object (File)
Wed, Dec 11, 03:31
Unknown Object (File)
Thu, Dec 5, 12:30
Unknown Object (File)
Wed, Dec 4, 23:18
Unknown Object (File)
Mon, Nov 25, 23:08
Unknown Object (File)
Sat, Nov 23, 01:07
Unknown Object (File)
Nov 22 2024, 15:37
Subscribers
None

Details

Summary

There is an HTTP redirection loop when file_get_contents, curl or
Guzzle is used to fetch a page of Le Soir website, but not with wget.

We so implement a fallback solution to download with this software.

Test Plan

Diff Detail

Repository
rSTG Source templates generator
Lint
No Lint Coverage
Unit
No Test Coverage
Branch
lesoir-wget
Build Status
Buildable 436
Build 544: arc lint + arc unit

Event Timeline

dereckson retitled this revision from to Allow to download with wget.
dereckson updated this object.
dereckson edited the test plan for this revision. (Show Details)
dereckson added a reviewer: xcombelle.
pages/DownloadWithWget.php
23

Is that safe?

33

This overloads Page::get_data, and so must currently be named like this.

A further change will rename every method to CamelCase.

Regression This commit introduces the following issue when the page IS NOT downloaded through this new method:

Warning: array_key_exists() expects parameter 2 to be array, null given in /usr/home/dereckson/dev/nasqueron/tools/3rdparty/source-templates-generator/page.php on line 237

The Page::meta_tags member isn't set as Page::analyse isn't called anymore.

Thanks to Scoopfinder to have noticed the issue.

page.php
124–129

Add $data = $this->data;.

Fixed issue reported by Scoopfinder.

dereckson added a reviewer: dereckson.
dereckson marked 2 inline comments as done.

@xcombelle confirmed on #wikipedia-fr the code is safe as far as security is concerned.

They also noted curl works with a wget user-agent, so maybe Le Soir has a whitelist: once a request is correctly done, further requests aren't filtered in one of the step.

We should maybe contact Le Soir operations team to inquire about this issue.

This revision is now accepted and ready to land.Mar 7 2016, 20:22
This revision was automatically updated to reflect the committed changes.