Python script for crawling API stops for some reason - make suggestions for improvement

  • Állapot: Closed
  • Díj: $20
  • Beérkezett pályamű: 3
  • Nyertes: RedLayers

A verseny összegzése

Dear all,

We're using the below script for making requests with the crawling provider proxycrawl.com (the documentation can be accessed here after having created a free account: https://proxycrawl.com/dashboard/docs).

The script is working well in general, however with one problem remaining: It simply stops working from time to time - sometimes after having successfully crawled a couple of hundred, sometimes only after a couple of thousand URLs. But we can't get it stable to crawl a couple of 10k URLs.

Please make suggestions right in the code - including a comment that describes why you made the change. We'll then test it and award the amount if the change brings the desired result.

Looking forward to your contributions!

Ajánlott készségek

Munkaadói értékelés

“Mario is a great guy and a pleasure to work with!”

Profilkép thomasjohn6, Germany.

A verseny legjobb pályaművei

További pályaművek

Nyilvános pontosítófelület

  • imo581
    imo581
    • 4 év telt el

    I tried your scripts with some links. The API responds with status code 403 Forbidden. I tried to use the API using a browser and it gives me this message "Token is invalid or account is temporarily blocked! please login to your dashboard for more details". Is something wrong with your subscription?

    • 4 év telt el
    1. thomasjohn6
      A verseny kiírója:
      • 4 év telt el

      Hello Islam, Thanks for your interest in the contest! I guess for somewhat obvious reasons, before posting the script in public, I removed the real token from the script :-)

      • 4 év telt el
  • busygayan
    busygayan
    • 4 év telt el

    Literally makes no sense for you to pay a third party service which costs you money, and their prices are pretty expensive.

    Why don't you create your own tiny system which can get this done ?
    It's nothing complicated.

    • 4 év telt el
    1. busygayan
      busygayan
      • 4 év telt el

      So 40 Bucks plus you need a sever which can handle 50K plain requests per an hour ? So to answer the question

      Proxy crawl cost - 2500 USD ( basic, not JavaScript )
      Custom approach cost - less than 400 USD ( with a 64GB / 16 vCPUs Server )

      Javascript based crawl on proxy crawl - $5,054.90
      Custom approach cost - less than 1000 USD ( 192 GB of ram , 32 vCPUs Server )

      Besides all that, the code is custom, its transparent and debugging is way easy.
      Your data is private.

      • 4 év telt el
    2. busygayan
      busygayan
      • 4 év telt el

      I have a bot which crawls facebook daily with over 1,000 concurrent accounts daily. custom coded using selenium with python and i make over 100 requests each second ( each request has its own unique IP / proxy ). Still i spend only around 2,000 on a monthly basis,

      This makes no sense and the customer is being technically ripped off, paying almost 5x the amount. Still the customer is stuck having to debug his own code, I'm not even going to go why the code fails. You could pay a couple of engineer a salary and have your own servers maintained with 0 issues for the amount that you spend on this company. even if you're doing this on a small scale, makes no sense.

      High cohesion is not bad at all, that's my point basically.
      Good Luck

      • 4 év telt el
  • thomasjohn6
    A verseny kiírója:
    • 4 év telt el

    Thanks for your comment! However, for now we would like to use the convenience of such a provider. Maybe later do it on our own. So do you have any idea what the problem could be in the script? Thanks in advance!

    • 4 év telt el

További hozzászólások

Így vágjon bele a versenyekbe

  • Tegye közzé a versenyét

    Indítson egy versenyt! Gyors és könnyű

  • Pályaművek százai várják

    Kapjon akár több száz pályaművet A világ minden szegletéből

  • Díjazza a legjobb pályázatot

    Díjazza a legjobb pályázatot Töltse le a fájlokat - csak egy kattintás!

Projekt közzététele vagy csatlakozzon hozzánk még ma!