Closed

Build a Website

This project received 43 bids from talented freelancers with an average bid price of $772 USD.

Get free quotes for a project like this
Employer working
Project Budget
$250 - $750 USD
Total Bids
43
Project Description

Hi there.

I need someone to write a program and auto catch data from this site : [url removed, login to view]

Not all the data.

Look at the category list in the right site first.

I need atuo catch some categories' data.

Some of them are under [恋愛・少女漫画], from [ちはやふる] to [すきっていいなよ。].

Rest fo them are under [ファンタジー漫画], from [終りのセラフ] to [クレイモア].

You have to catch data in the way I need.

Data Structure

id----order

url----URL of the article

cat----category

title----title

magazine----magazine

author----author

genre----genre

character----character

site----goal website

article----the text

entry_data_at----publish time

created_at----catch time

picture----the cover the article

*Check tip1,tip2,tip3

[cat],means the name of [url removed, login to view] the category list in the right [url removed, login to view] can see words like [ちはやふる] and [黒崎くんの言いなりになんてならない],they are categorise.

[title],check the category [ちはやふる],turns to a new page,you can see words such as [ちはやふる33巻173首のネタバレ感想] or [ちはやふる33巻172首のネタバレ感想], they are titles.

[article],check one title like [ちはやふる33巻173首のネタバレ感想],turns to a new page,you can see an article with lot of [url removed, login to view] have to catch the body which from the title(ちはやふる33巻173首のネタバレ感想) to the end of the article (end at the place above [目次][コメント] and advertisements).

[entry_data_at],means the publish time of the articel,for example,the publish time of ちはやふる33巻173首のネタバレ感想 is the one written under the title - 2016/10/[url removed, login to view] have to record it by using timestamp,which would turn 2016/10/01 into 1451577600.

[url],means the url of the article,like [url removed, login to view]

[site],all write as [url removed, login to view]

[character],for example,[url removed, login to view],under advertisements,there is a [目次] [url removed, login to view] can see [33巻173首] write in black and has no [url removed, login to view]'s the [character]

About [author],[magazine],[genre],[picutre],[id] and [created_at],should do the following step first.

Search [cat] in [url removed, login to view],use the first result.

For example,search [ちはやふる] in [url removed, login to view],you can get:

作家:末次由紀

雑誌・レーベル: BE・LOVE

ジャンル: スポーツ / 少女マンガ / アニメ化 / 映画化

So,

[author],means the words after [作家:]. In the example the [author] is [末次由紀].

[magazine],means the words after [雑誌・レーベル:], In the example the [magazine] is [BE・LOVE].

[genre],means the words after [genre:],need to use "," to separate them. In the example the [genre] is [スポーツ,少女マンガ,アニメ化,映画化].

[pitucre],the cover of the first [url removed, login to view] have to catch covers and store [url removed, login to view] the datebase there should add a data bar of [pictuer] and have url of each cover.

[id],means the order, the first one is 1, the second one is 2, etc.

[created_at],means the time you catch the article,also have to record by using timestamp. For example,if I catch the date on UTC/GMT+08:00 2016/10/11 14:40:30, so the [created_at] should be 1476168030.

Use [ちはやふる] as the example, do what I said,you can get:

id:1

url:[url removed, login to view]

cat:ちはやふる

title:ちはやふる33巻173首のネタバレ感想

magazine:BE・LOVE

author:末次由紀

genre:スポーツ,少女マンガ,アニメ化,映画化

character:33巻173首

site:[url removed, login to view]

article:<h1 class="entry-title">..........

picture: ...

entry_data_at:1451577600

created_at:1476168030

*Check the datebase sample photo

This is what I [url removed, login to view] have to do in this way to make my server can recognize the data.

Need to catch data 2 hours one [url removed, login to view] to send me the program you write to catch data.

Please tab 1234 in your bid.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online