Hello,
Since the file is local, you don't need a scraper, rather a way to read huge files locally, I could write some scripts for you in Spark, so that you are able to query the file as if it were a DB and get your data as output. The script would be highly customisable, ie, custom query, should automatically infer schema, write to files in different formats.
Let me know if you wish to take this forward.
Thanks
R