Perl parsing logfiles

software development

#1

Im going to be using perl to parse some logfiles that are on a ftp of mine, these logfiles can and probably will get large, which will take some processor and memory usage for a couple of minutes at a time… i understand that on shared hosting, this would probably be a tad troublesome.

the question is, if i set this script to parse only once every couple of days, and at midnight, would everything be fine?


#2

It depends how big the files are and how much memory overhead will be used. Unless you’re talking big files then I don’t see a problem as Perl is pretty good at this sort of thing. :slight_smile:

Remember that your midnight is not my midnight, and the servers traffic is pretty much constant around the clock, or so I would imagine. So there’s never really a good time to do this.

I believe Dreamhost have a program which places caps on memory usage per program, so if you do exceed this quota the process will be killed anyway, so I don’t think you run the risk of bringing the server down to it’s knees.

A few suggestions if I may: download the files and process them offline on your servers. After all, you shouldn’t really be using a production server for this kind of work.

Or chop up the files into multiple batches and work on smaller batch files to get your results without consuming too much overheads.

Hope this helps.

  • wil

#3

[quote]if you do exceed this quota the process will be killed anyway, so I
don’t think you run the risk of bringing the server down to it’s
knees.

[/quote]

This is true to some degree, but our system is fairly lenient when it can be. A process can severely overload a server before we ‘catch’ it, so we still depend to some degree on the kindness of our customers. As long as everyone realizes that our hosting servers are a shared resource, it’s usually okay.

[quote]A few suggestions if I may: download the files and process them
offline on your servers. After all, you shouldn’t really be using a
production server for this kind of work.

[/quote]

This is the best option, but…

[quote]Or chop up the files into multiple batches and work on smaller
batch files to get your results without consuming too much
overheads.

[/quote]

…this is probably okay too. Break the file in XX MB chunks, and do the processing in ‘spurts’.

I’m not sure what kind of logs are being processed here, or how large the files will be. If they’re pretty small, this is all really sort of academic.

  • Jeff @ DreamHost
  • DH Discussion Forum Admin